SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published 3 days ago • 97
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 5 days ago • 53
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 10 days ago • 139
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers Paper • 2502.06527 • Published 13 days ago • 9
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla • Jan 20 • 61
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Paper • 2502.04299 • Published 17 days ago • 15
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 3 items • Updated 27 days ago • 359
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 207
DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics Paper • 2407.02274 • Published Jul 2, 2024 • 1
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others • Dec 23, 2024 • 18
Eagle 2 Collection Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 9 items • Updated Jan 23 • 31