Vadim Karpenko

jrell
Β·

AI & ML interests

None yet

Recent Activity

liked a model about 1 month ago
PramaLLC/BEN
View all activity

Organizations

None yet

jrell's activity

New activity in ArliAI/Qwen2.5-32B-ArliAI-RPMax-v1.3-GGUF about 1 month ago

32k context bug

#1 opened about 1 month ago by
jrell
New activity in mradermacher/model_requests 2 months ago

LoLCATs models.

1
#369 opened 2 months ago by
jrell
reacted to singhsidhukuldeep's post with ❀️ 7 months ago
view post
Post
1457
You are all happy 😊 that @meta-llama released Llama 3.

Then you are sad πŸ˜” that it only has a context length of 8k.

Then you are happy πŸ˜„ that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad 😒 it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy 😁 that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" πŸ“œ.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuningβš™οΈ.

The training cycle is highly efficient, taking "only" πŸ˜‚ 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. ✁

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.πŸ“Š

The paper suggests that the context length could be extended far beyond 80K with more computation resources (πŸ˜… GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the ❀️ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... 🌟

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)
New activity in TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF about 1 year ago

LM Studio crash

9
#2 opened about 1 year ago by
jrell
New activity in TheBloke/MPT-7B-Storywriter-GGML over 1 year ago

How to run this Model ?

8
#1 opened over 1 year ago by deleted