dinhanhx's picture

dinhanhx

dinhanhx

AI & ML interests

Vision Language

Recent Activity

replied to merve's post 1 day ago
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧢 ✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2 ✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️ Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B they evaluate sampling strategies, scaling laws for models and datasets, video representation and more! > The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled πŸ“ˆ scaling dataset has diminishing returns for smaller models > They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal > They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2 they find https://huggingface.co/google/siglip-so400m-patch14-384 to be most powerful πŸ”₯ > they also compare freezing different parts of models, training all stages with some frozen parts give the best yield They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models πŸ”₯
liked a model 1 day ago
erax-ai/EraX-VL-2B-V1.5
View all activity

Organizations

Blog-explorers's profile picture ZeroGPU Explorers's profile picture Plastanium's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture

dinhanhx's activity

upvoted an article about 1 month ago
upvoted an article about 2 months ago
view article
Article

BM25 for Python: Achieving high performance while simplifying dependencies with *BM25S*⚑

By xhluca β€’
β€’ 41
upvoted an article 3 months ago
view article
Article

Training and Finetuning Embedding Models with Sentence Transformers v3

β€’ 167
upvoted an article 3 months ago
view article
Article

Llama can now see and run on your device - welcome Llama 3.2

β€’ 180
upvoted 2 articles 6 months ago
view article
Article

SmolLM - blazingly fast and remarkably powerful

β€’ 294
view article
Article

ColPali: Efficient Document Retrieval with Vision Language Models πŸ‘€

By manu β€’
β€’ 183
upvoted an article 7 months ago
view article
Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

β€’ 66
upvoted an article 8 months ago
view article
Article

From cloud to developers: Hugging Face and Microsoft Deepen Collaboration

β€’ 8