Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published 3 days ago • 27
Unlocking Continual Learning Abilities in Language Models Paper • 2406.17245 • Published 4 days ago • 23
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers Paper • 2406.16747 • Published 5 days ago • 15
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published 10 days ago • 12
Style-NeRF2NeRF: 3D Style Transfer From Style-Aligned Multi-View Images Paper • 2406.13393 • Published 10 days ago • 5
Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps Paper • 2406.14539 • Published 9 days ago • 24
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published 14 days ago • 64
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper • 2406.14347 • Published 9 days ago • 96
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Paper • 2406.14562 • Published 9 days ago • 26
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published 15 days ago • 21
Make It Count: Text-to-Image Generation with an Accurate Number of Objects Paper • 2406.10210 • Published 15 days ago • 74
Vivid-ZOO: Multi-View Video Generation with Diffusion Model Paper • 2406.08659 • Published 16 days ago • 7
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors Paper • 2406.10111 • Published 15 days ago • 6
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published 16 days ago • 17
DiTFastAttn: Attention Compression for Diffusion Transformer Models Paper • 2406.08552 • Published 17 days ago • 19
Interpreting the Weight Space of Customized Diffusion Models Paper • 2406.09413 • Published 16 days ago • 18
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published 16 days ago • 28
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters Paper • 2406.05955 • Published 19 days ago • 21
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published 19 days ago • 35
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published 21 days ago • 39
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising Paper • 2406.06911 • Published 18 days ago • 10
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published 18 days ago • 53
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models Paper • 2406.07472 • Published 18 days ago • 10
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published 21 days ago • 12
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis Paper • 2406.06216 • Published 19 days ago • 16
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Paper • 2406.04333 • Published 23 days ago • 36
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published 23 days ago • 26
4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published 29 days ago • 10
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published 29 days ago • 60
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture Paper • 2405.18991 • Published about 1 month ago • 12
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29 • 20
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published May 24 • 14
Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning Paper • 2405.17258 • Published May 27 • 12
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Paper • 2405.16822 • Published May 27 • 11
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published May 20 • 23
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Paper • 2405.11252 • Published May 18 • 11
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 49
Paint by Inpaint: Learning to Add Image Objects by Removing Them First Paper • 2404.18212 • Published Apr 28 • 25
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30 • 20
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 65
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 69
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning Paper • 2404.15449 • Published Apr 23 • 11
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published Apr 24 • 16
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 56
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16