Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 3 days ago • 58
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 16 days ago • 76
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published Dec 16, 2024 • 23
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published Dec 5, 2024 • 16
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 78
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis Paper • 2412.01819 • Published Dec 2, 2024 • 35
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Paper • 2412.01064 • Published Dec 2, 2024 • 26
Running on CPU Upgrade 2.01k 2.01k Anychat 🏢 Select and view code snippets for different chat providers
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation Paper • 2411.08033 • Published Nov 12, 2024 • 22
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14, 2024 • 68
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7, 2024 • 51
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors Paper • 2410.16271 • Published Oct 21, 2024 • 81
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Paper • 2410.13830 • Published Oct 17, 2024 • 25
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14, 2024 • 55