DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 181
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14 • 16
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 86
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1 • 30
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 33
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation Paper • 2404.03673 • Published Mar 25 • 14
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published Apr 19 • 29
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era Paper • 1707.02968 • Published Jul 10, 2017 • 1
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement Paper • 2404.05669 • Published Apr 8 • 1
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 7 days ago • 90