Aosong Fen's picture

6

Aosong Fen

afeng

AI & ML interests

None yet

Recent Activity

reacted to merve's post with 🚀 21 days ago

Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧶 ✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2 ✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️ Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B they evaluate sampling strategies, scaling laws for models and datasets, video representation and more! > The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled 📈 scaling dataset has diminishing returns for smaller models > They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal > They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2 they find https://huggingface.co/google/siglip-so400m-patch14-384 to be most powerful 🔥 > they also compare freezing different parts of models, training all stages with some frozen parts give the best yield They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models 🔥

liked a model about 1 month ago

Djrango/Qwen2vl-Flux

liked a model about 1 month ago

mistral-community/pixtral-12b

View all activity

Organizations

None yet

afeng's activity

liked 2 models about 1 month ago

Djrango/Qwen2vl-Flux

Text-to-Image • Updated Dec 6, 2024 • 457

mistral-community/pixtral-12b

Image-Text-to-Text • Updated 16 days ago • 26.3k • 81

liked a dataset 4 months ago

routellm/gpt4_dataset

Viewer • Updated Jun 11, 2024 • 119k • 83 • 8

liked 2 models 4 months ago

amazon/Titan-text-embeddings-v2

Feature Extraction • Updated Apr 30, 2024 • 745 • 9

black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Aug 16, 2024 • 1.17M • • 7.81k

liked a dataset 5 months ago

proj-persona/PersonaHub

Viewer • Updated Oct 5, 2024 • 375k • 3.03k • 482