Vadim Karpenko's picture

7 6

Vadim Karpenko

jrell

·

AI & ML interests

None yet

Recent Activity

new activity about 1 month ago

ArliAI/Qwen2.5-32B-ArliAI-RPMax-v1.3-GGUF:32k context bug

liked a model about 1 month ago

PramaLLC/BEN

View all activity

Organizations

None yet

jrell's activity

New activity in ArliAI/Qwen2.5-32B-ArliAI-RPMax-v1.3-GGUF about 1 month ago

32k context bug

#1 opened about 1 month ago by

liked a model about 1 month ago

PramaLLC/BEN

Image Segmentation • Updated Nov 21 • 246 • 78

New activity in mradermacher/model_requests 2 months ago

LoLCATs models.

#369 opened 2 months ago by

New activity in ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2 2 months ago

LM Studio produces gibberish (GGUF)

#1 opened 2 months ago by

liked a model 3 months ago

mistralai/Mistral-Small-Instruct-2409

Updated Oct 16 • 2.76M • 361

New activity in QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF 5 months ago

llama.cpp error: 'done_getting_tensors: wrong number of tensors; expected 292, got 291'

#1 opened 5 months ago by

New activity in cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf 5 months ago

"llama.cpp error: 'error loading model vocabulary: unknown pre-tokenizer type: 'dolphin12b''"

#1 opened 5 months ago by

reacted to singhsidhukuldeep's post with ❤️ 7 months ago

Post

1457

You are all happy 😊 that @meta-llama released Llama 3.

Then you are sad 😔 that it only has a context length of 8k.

Then you are happy 😄 that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad 😢 it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy 😁 that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" 📜.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuning⚙️.

The training cycle is highly efficient, taking "only" 😂 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. ✁

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.📊

The paper suggests that the context length could be extended far beyond 80K with more computation resources (😅 GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the ❤️ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... 🌟

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)

New activity in TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF about 1 year ago

LM Studio crash

#2 opened about 1 year ago by

liked 2 models about 1 year ago

FPHam/Karen_TheEditor_V2_CREATIVE_Mistral_7B

Text Generation • Updated Nov 21, 2023 • 118 • 26

FPHam/Karen_TheEditor_V2_STRICT_Mistral_7B

Text Generation • Updated Apr 21 • 908 • 16

liked 2 models over 1 year ago

elinas/chronos-13b

Text Generation • Updated Jun 23, 2023 • 16 • 33

TheBloke/chronos-13B-GGML

Updated Jun 9, 2023 • 20

New activity in TheBloke/MPT-7B-Storywriter-GGML over 1 year ago

How to run this Model ?

#1 opened over 1 year ago by deleted