Jiaming Han's picture

Jiaming Han

csuhan

·

https://csuhan.com

csuhan

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper about 7 hours ago

Long Context Tuning for Video Generation

upvoted a paper about 7 hours ago

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

updated a dataset about 19 hours ago

csuhan/demo_jdb_flux

View all activity

Organizations

None yet

csuhan's activity

upvoted 2 papers about 7 hours ago

Long Context Tuning for Video Generation

Paper • 2503.10589 • Published about 17 hours ago • 6

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published about 17 hours ago • 21

upvoted a paper 17 days ago

Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation

Paper • 2502.16707 • Published 19 days ago • 11

upvoted 2 papers about 2 months ago

Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published Jan 14 • 33

VideoAuteur: Towards Long Narrative Video Generation

Paper • 2501.06173 • Published Jan 10 • 31

upvoted 2 papers 3 months ago

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Paper • 2412.18597 • Published Dec 24, 2024 • 19

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Paper • 2412.02611 • Published Dec 3, 2024 • 24

upvoted 4 papers 5 months ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 56

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 67

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 82

Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant

Paper • 2410.13360 • Published Oct 17, 2024 • 9

upvoted 3 papers over 1 year ago

OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 24

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

Paper • 2311.07575 • Published Nov 13, 2023 • 15

ImageBind-LLM: Multi-modality Instruction Tuning

Paper • 2309.03905 • Published Sep 7, 2023 • 17