Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints Paper • 2402.04754 • Published Feb 7, 2024
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models Paper • 2407.19185 • Published Jul 27, 2024 • 1
ARTIST: Improving the Generation of Text-rich Images by Disentanglement Paper • 2406.12044 • Published Jun 17, 2024
MMR: Evaluating Reading Ability of Large Multimodal Models Paper • 2408.14594 • Published Aug 26, 2024
TextLap: Customizing Language Models for Text-to-Layout Planning Paper • 2410.12844 • Published Oct 9, 2024
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding Paper • 2411.01106 • Published Nov 2, 2024 • 4
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner Paper • 2412.10533 • Published 20 days ago • 5
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner Paper • 2412.10533 • Published 20 days ago • 5 • 2
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models Paper • 2410.03290 • Published Oct 4, 2024 • 7
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Paper • 2406.09305 • Published Jun 13, 2024 • 4
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation Paper • 2406.09305 • Published Jun 13, 2024 • 4 • 2
LAFITE: Towards Language-Free Training for Text-to-Image Generation Paper • 2111.13792 • Published Nov 27, 2021
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Paper • 2306.17107 • Published Jun 29, 2023 • 11
Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach Paper • 2305.13579 • Published May 23, 2023 • 3