Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis Paper • 2401.09048 • Published Jan 17 • 7
Improving fine-grained understanding in image-text pre-training Paper • 2401.09865 • Published Jan 18 • 13
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 54
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild Paper • 2401.13627 • Published Jan 24 • 70
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion Paper • 2401.13388 • Published Jan 24 • 10
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing Paper • 2402.02583 • Published Feb 4 • 7
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21 • 26
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching Paper • 2402.14167 • Published Feb 21 • 9
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper • 2402.15504 • Published Feb 23 • 20
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 184
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models Paper • 2402.19481 • Published Feb 29 • 17
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization Paper • 2403.00483 • Published Mar 1 • 9
ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models Paper • 2403.02084 • Published Mar 4 • 12
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Paper • 2403.01779 • Published Mar 4 • 26
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 47
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 40
Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM Paper • 2403.07487 • Published Mar 12 • 12
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14 • 15
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control Paper • 2403.09055 • Published Mar 14 • 24
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models Paper • 2403.13535 • Published Mar 20 • 20
DepthFM: Fast Monocular Depth Estimation with Flow Matching Paper • 2403.13788 • Published Mar 20 • 15
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 14
FlashFace: Human Image Personalization with High-fidelity Identity Preservation Paper • 2403.17008 • Published Mar 25 • 18
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27 • 22
Condition-Aware Neural Network for Controlled Image Generation Paper • 2404.01143 • Published Apr 1 • 11
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 28
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation Paper • 2404.03673 • Published Mar 25 • 14
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Paper • 2404.08197 • Published Apr 12 • 26
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published Apr 15 • 11
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Paper • 2404.11565 • Published Apr 17 • 12
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 26
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published Apr 24 • 16
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning Paper • 2404.15449 • Published Apr 23 • 11
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 49
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published May 21 • 21
RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance Paper • 2405.14677 • Published May 23 • 8
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis Paper • 2405.14224 • Published May 23 • 8
Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models Paper • 2405.16759 • Published May 27 • 7
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published 23 days ago • 36
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 18 days ago • 53
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising Paper • 2406.06911 • Published 18 days ago • 10
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation Paper • 2406.08392 • Published 17 days ago • 17
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published 16 days ago • 47
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published 16 days ago • 28
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts Paper • 2406.09162 • Published 16 days ago • 13
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published 15 days ago • 21
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published 12 days ago • 18
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published 14 days ago • 64
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps Paper • 2406.14539 • Published 9 days ago • 24
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 5 days ago • 52
Aligning Diffusion Models with Noise-Conditioned Perception Paper • 2406.17636 • Published 4 days ago • 23