Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
2
2
Donutanti
Donutanti
Follow
Mi6paulino's profile picture
1 follower
ยท
6 following
AI & ML interests
None yet
Recent Activity
liked
a Space
10 days ago
yuntian-deng/o1mini
Reacted to
m-ric
's
post
with ๐ฅ
about 2 months ago
Emu3: Next-token prediction conquers multimodal tasks ๐ฅ This is the most important research in months: weโre now very close to having a single architecture to handle all modalities. The folks at Beijing Academy of Artificial Intelligence (BAAI) just released Emu3, a single model that handles text, images, and videos all at once. ๐ช๐ต๐ฎ๐'๐ ๐๐ต๐ฒ ๐ฏ๐ถ๐ด ๐ฑ๐ฒ๐ฎ๐น? ๐ Emu3 is the first model to truly unify all these different types of data (text, images, video) using just one simple trick: predicting the next token. And itโs only 8B, but really strong: ๐ผ๏ธ For image generation, it's matching the best specialized models out there, like SDXL. ๐๏ธ In vision tasks, it's outperforming top models like LLaVA-1.6-7B, which is a big deal for a model that wasn't specifically designed for this. ๐ฌ It's the first to nail video generation without using complicated diffusion techniques. ๐๐ผ๐ ๐ฑ๐ผ๐ฒ๐ ๐ถ๐ ๐๐ผ๐ฟ๐ธ? ๐งฉ Emu3 uses a special tokenizer (SBER-MoVQGAN) to turn images and video clips into sequences of 4,096 tokens. ๐ Then, it treats everything - text, images, and videos - as one long series of tokens to predict. ๐ฎ During training, it just tries to guess the next token, whether that's a word, part of an image, or a video frame. ๐๐ฎ๐๐ฒ๐ฎ๐๐ ๐ผ๐ป ๐๐ต๐ฒ ๐ฟ๐ฒ๐๐๐น๐๐: ๐ In image generation, Emu3 beats SDXL, but itโs also much bigger (8B vs 3.5B). It would be more difficult to beat the real diffusion GOAT FLUX-dev. ๐ In vision, authors also donโt show a comparison against all the current SOTA models like Qwen-VL or Pixtral. This approach is exciting because it's simple (next token prediction) and scalable(handles all sorts of data)! Read the paper ๐ https://huggingface.co/papers/2409.18869
View all activity
Organizations
None yet
Donutanti
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
upvoted
2 articles
5 months ago
view article
Article
BrAIn: next generation neurons?
By
as-cle-bert
โข
Jun 5
โข
15
view article
Article
Introduction to 3D Gaussian Splatting
Sep 18, 2023
โข
32