3 2 76

Syahmi Azhar

prsyahmi

prsyahmi

AI & ML interests

None yet

Recent Activity

liked a model 11 days ago

NexaAIDev/omnivision-968M

liked a model 11 days ago

rain1011/pyramid-flow-sd3

liked a model 11 days ago

rain1011/pyramid-flow-miniflux

View all activity

Organizations

None yet

prsyahmi's activity

liked 3 models 11 days ago

liked a model 19 days ago

microsoft/OmniParser

Image-Text-to-Text • Updated 24 days ago • 11.3k • 1.35k

liked a Space 19 days ago

Running on Zero

181

😻

OmniParser

liked a model 21 days ago

alimama-creative/SDXL-EcomID

Text-to-Image • Updated Oct 24 • 5.88k • 58

liked a model about 1 month ago

suno/bark

Text-to-Speech • Updated Oct 4, 2023 • 35.8k • 1.16k

liked a model 3 months ago

city96/FLUX.1-dev-gguf

Text-to-Image • Updated Aug 18 • 122k • 672

liked a model 4 months ago

ChristianAzinn/snowflake-arctic-embed-l-gguf

liked 2 models 6 months ago

stabilityai/stable-diffusion-3-medium

Text-to-Image • Updated Aug 12 • 39.7k • 4.59k

sd-community/sdxl-flash

Text-to-Image • Updated Jun 3 • 39.4k • 179

Reacted to singhsidhukuldeep's post with 🤗 6 months ago

Post

1456

You are all happy 😊 that @meta-llama released Llama 3.

Then you are sad 😔 that it only has a context length of 8k.

Then you are happy 😄 that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad 😢 it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy 😁 that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" 📜.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuning⚙️.

The training cycle is highly efficient, taking "only" 😂 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. ✁

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.📊

The paper suggests that the context length could be extended far beyond 80K with more computation resources (😅 GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the ❤️ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... 🌟

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)