Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass Paper • 2501.13928 • Published 7 days ago • 14
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla • 10 days ago • 52
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Paper • 2501.09012 • Published 15 days ago • 10
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published 20 days ago • 42
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 21 days ago • 84
OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis Paper • 2501.04561 • Published 22 days ago • 16
The Superposition of Diffusion Models Using the Itô Density Estimator Paper • 2412.17762 • Published Dec 23, 2024 • 12
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published Oct 14, 2024 • 49
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published Oct 10, 2024 • 50
Intriguing Properties of Large Language and Vision Models Paper • 2410.04751 • Published Oct 7, 2024 • 16
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published Oct 10, 2024 • 25
Space-Time Video Super-resolution with Neural Operator Paper • 2404.06036 • Published Apr 9, 2024 • 1
VideoGigaGAN: Towards Detail-rich Video Super-Resolution Paper • 2404.12388 • Published Apr 18, 2024 • 1
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published Jun 19, 2024 • 17
Improving Generative Adversarial Networks for Video Super-Resolution Paper • 2406.16359 • Published Jun 24, 2024 • 1
Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors Paper • 2407.09919 • Published Jul 13, 2024 • 1
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control Paper • 2407.12781 • Published Jul 17, 2024 • 13
PALP: Prompt Aligned Personalization of Text-to-Image Models Paper • 2401.06105 • Published Jan 11, 2024 • 49