TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation Paper • 2412.03069 • Published 12 days ago • 29
Are Emergent Abilities of Large Language Models a Mirage? Paper • 2304.15004 • Published Apr 28, 2023 • 6
Scaling Image Tokenizers with Grouped Spherical Quantization Paper • 2412.02632 • Published 12 days ago • 9
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published Oct 17 • 30
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published 10 days ago • 98
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published 10 days ago • 12