mlx-community/Llama-3.2-11B-Vision-Instruct-abliterated Image-Text-to-Text β’ Updated Dec 16, 2024 β’ 8.92k β’ 5
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass Paper β’ 2501.13928 β’ Published 7 days ago β’ 14
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla β’ 10 days ago β’ 52
view article Article Timm β€οΈ Transformers: Use any timm model with transformers 15 days ago β’ 36
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Paper β’ 2501.09012 β’ Published 15 days ago β’ 10
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper β’ 2501.06282 β’ Published 20 days ago β’ 42