-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper ā¢ 2402.04252 ā¢ Published ā¢ 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper ā¢ 2402.03749 ā¢ Published ā¢ 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper ā¢ 2402.04615 ā¢ Published ā¢ 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ā¢ 2402.05008 ā¢ Published ā¢ 19
Collections
Discover the best community collections!
Collections including paper arxiv:2409.13591
-
Controllable Text Generation for Large Language Models: A Survey
Paper ā¢ 2408.12599 ā¢ Published ā¢ 63 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper ā¢ 2408.12590 ā¢ Published ā¢ 34 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper ā¢ 2408.12588 ā¢ Published ā¢ 15 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper ā¢ 2408.11039 ā¢ Published ā¢ 56
-
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
Paper ā¢ 2405.16537 ā¢ Published ā¢ 16 -
ReVideo: Remake a Video with Motion and Content Control
Paper ā¢ 2405.13865 ā¢ Published ā¢ 23 -
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models
Paper ā¢ 2406.16863 ā¢ Published ā¢ 10 -
Portrait Video Editing Empowered by Multimodal Generative Priors
Paper ā¢ 2409.13591 ā¢ Published ā¢ 15
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper ā¢ 2402.17485 ā¢ Published ā¢ 189 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper ā¢ 2312.01841 ā¢ Published ā¢ 1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper ā¢ 2311.16498 ā¢ Published ā¢ 1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper ā¢ 2312.02134 ā¢ Published ā¢ 2
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper ā¢ 2401.09416 ā¢ Published ā¢ 10 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper ā¢ 2401.10171 ā¢ Published ā¢ 13 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper ā¢ 2311.09217 ā¢ Published ā¢ 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper ā¢ 2401.12979 ā¢ Published ā¢ 7
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper ā¢ 2308.16582 ā¢ Published ā¢ 10 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper ā¢ 2310.13119 ā¢ Published ā¢ 11 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper ā¢ 2310.16818 ā¢ Published ā¢ 30 -
Text-to-3D with classifier score distillation
Paper ā¢ 2310.19415 ā¢ Published ā¢ 4