Yufan Zhou's picture

3

Yufan Zhou

YfZ

·

AI & ML interests

multimodal generative models

Recent Activity

authored a paper 15 days ago

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

authored a paper 15 days ago

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

authored a paper 15 days ago

ARTIST: Improving the Generation of Text-rich Images by Disentanglement

View all activity

Organizations

YfZ's activity

authored 7 papers 15 days ago

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

Paper • 2402.04754 • Published Feb 7, 2024

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

Paper • 2407.19185 • Published Jul 27, 2024 • 1

ARTIST: Improving the Generation of Text-rich Images by Disentanglement

Paper • 2406.12044 • Published Jun 17, 2024

MMR: Evaluating Reading Ability of Large Multimodal Models

Paper • 2408.14594 • Published Aug 26, 2024

TextLap: Customizing Language Models for Text-to-Layout Planning

Paper • 2410.12844 • Published Oct 9, 2024

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

Paper • 2411.01106 • Published Nov 2, 2024 • 4

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

Paper • 2412.10533 • Published 20 days ago • 5

commented a paper 15 days ago

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

Paper • 2412.10533 • Published 20 days ago • 5 •

authored a paper 3 months ago

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

Paper • 2410.03290 • Published Oct 4, 2024 • 7

authored a paper 7 months ago

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Paper • 2406.09305 • Published Jun 13, 2024 • 4

commented a paper 7 months ago

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Paper • 2406.09305 • Published Jun 13, 2024 • 4 •

authored 3 papers about 1 year ago

Shifted Diffusion for Text-to-image Generation

Paper • 2211.15388 • Published Nov 24, 2022

Customization Assistant for Text-to-image Generation

Paper • 2312.03045 • Published Dec 5, 2023

LAFITE: Towards Language-Free Training for Text-to-Image Generation

Paper • 2111.13792 • Published Nov 27, 2021

authored 2 papers over 1 year ago

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Paper • 2305.13579 • Published May 23, 2023 • 3