T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching Paper • 2402.14167 • Published Feb 21 • 9
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping Paper • 2402.14083 • Published Feb 21 • 43
BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay Paper • 2402.14194 • Published Feb 22 • 5
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation Paper • 2402.14795 • Published Feb 22 • 5
Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion Paper • 2401.17583 • Published Jan 31 • 23
Adaptive Mobile Manipulation for Articulated Objects In the Open World Paper • 2401.14403 • Published Jan 25 • 9
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents Paper • 2401.12963 • Published Jan 23 • 12
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models Paper • 2403.02084 • Published Mar 4 • 12
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion Paper • 2403.05121 • Published Mar 8 • 18
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 40
TinyLLaVA: A Framework of Small-scale Large Multimodal Models Paper • 2402.14289 • Published Feb 22 • 17
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis Paper • 2402.14797 • Published Feb 22 • 19
GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion Paper • 2402.14810 • Published Feb 22 • 8
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper • 2402.15504 • Published Feb 23 • 20
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models Paper • 2402.15021 • Published Feb 22 • 12
Seamless Human Motion Composition with Blended Positional Encodings Paper • 2402.15509 • Published Feb 23 • 13
RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models Paper • 2402.12908 • Published Feb 20 • 6
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation Paper • 2402.10491 • Published Feb 16 • 16
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter Paper • 2402.10896 • Published Feb 16 • 14
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models Paper • 2401.13974 • Published Jan 25 • 12
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support Paper • 2401.14688 • Published Jan 26 • 13
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 19
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 47
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Paper • 2401.15687 • Published Jan 28 • 20
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding Paper • 2401.15708 • Published Jan 28 • 10
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation Paper • 2401.15688 • Published Jan 28 • 11
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 85
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion Paper • 2401.13388 • Published Jan 24 • 10
Small Language Model Meets with Reinforced Vision Vocabulary Paper • 2401.12503 • Published Jan 23 • 31
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution Paper • 2401.10404 • Published Jan 18 • 9
Scaling Face Interaction Graph Networks to Real World Scenes Paper • 2401.11985 • Published Jan 22 • 2
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Paper • 2401.11708 • Published Jan 22 • 28
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data Paper • 2401.01173 • Published Jan 2 • 11
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM Paper • 2401.01256 • Published Jan 2 • 17
AGG: Amortized Generative 3D Gaussians for Single Image to 3D Paper • 2401.04099 • Published Jan 8 • 7
Q-Refine: A Perceptual Quality Refiner for AI-Generated Image Paper • 2401.01117 • Published Jan 2 • 7
TrailBlazer: Trajectory Control for Diffusion-Based Video Generation Paper • 2401.00896 • Published Dec 31, 2023 • 13
Taming Mode Collapse in Score Distillation for Text-to-3D Generation Paper • 2401.00909 • Published Dec 31, 2023 • 9
Make-A-Character: High Quality Text-to-3D Character Generation within Minutes Paper • 2312.15430 • Published Dec 24, 2023 • 26
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise Paper • 2312.12436 • Published Dec 19, 2023 • 13
Clockwork Diffusion: Efficient Generation With Model-Step Distillation Paper • 2312.08128 • Published Dec 13, 2023 • 12
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis Paper • 2312.13834 • Published Dec 20, 2023 • 26
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers Paper • 2312.12468 • Published Dec 19, 2023 • 8
InstructVideo: Instructing Video Diffusion Models with Human Feedback Paper • 2312.12490 • Published Dec 19, 2023 • 15
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models Paper • 2312.12487 • Published Dec 19, 2023 • 7
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model Paper • 2312.13252 • Published Dec 20, 2023 • 26
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models Paper • 2312.05107 • Published Dec 8, 2023 • 35
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models Paper • 2312.09767 • Published Dec 15, 2023 • 25
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation Paper • 2312.04483 • Published Dec 7, 2023 • 7
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning Paper • 2401.16013 • Published Jan 29 • 19