Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
6
Aosong Fen
afeng
Follow
AI & ML interests
None yet
Recent Activity
reacted
to
merve
's
post
with š
21 days ago
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models š§¶ āØ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2 āØ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work āÆļø Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B they evaluate sampling strategies, scaling laws for models and datasets, video representation and more! > The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled š scaling dataset has diminishing returns for smaller models > They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal > They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2 they find https://huggingface.co/google/siglip-so400m-patch14-384 to be most powerful š„ > they also compare freezing different parts of models, training all stages with some frozen parts give the best yield They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models š„
liked
a model
about 1 month ago
Djrango/Qwen2vl-Flux
liked
a model
about 1 month ago
mistral-community/pixtral-12b
View all activity
Organizations
None yet
afeng
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
2 models
about 1 month ago
Djrango/Qwen2vl-Flux
Text-to-Image
ā¢
Updated
Dec 6, 2024
ā¢
457
mistral-community/pixtral-12b
Image-Text-to-Text
ā¢
Updated
16 days ago
ā¢
26.3k
ā¢
81
liked
a dataset
4 months ago
routellm/gpt4_dataset
Viewer
ā¢
Updated
Jun 11, 2024
ā¢
119k
ā¢
83
ā¢
8
liked
2 models
4 months ago
amazon/Titan-text-embeddings-v2
Feature Extraction
ā¢
Updated
Apr 30, 2024
ā¢
745
ā¢
9
black-forest-labs/FLUX.1-dev
Text-to-Image
ā¢
Updated
Aug 16, 2024
ā¢
1.17M
ā¢
ā¢
7.81k
liked
a dataset
5 months ago
proj-persona/PersonaHub
Viewer
ā¢
Updated
Oct 5, 2024
ā¢
375k
ā¢
3.03k
ā¢
482