Jianshu Zhang's picture

Jianshu Zhang

Sterzhang

·

https://sterzhang.github.io/

AI & ML interests

Data-Centric AI, Multi-Modal Understanding

Recent Activity

updated a dataset 3 days ago

Sterzhang/tmp-bench

liked a dataset 5 days ago

ImagenHub/Text_Guided_Image_Editing

upvoted a paper 10 days ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

View all activity

Organizations

Sterzhang's activity

upvoted 2 papers 10 days ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 12 days ago • 92

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published 17 days ago • 78

upvoted 2 papers about 1 month ago

OminiControl: Minimal and Universal Control for Diffusion Transformer

Paper • 2411.15098 • Published Nov 22, 2024 • 53

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Paper • 2411.17440 • Published Nov 26, 2024 • 35

upvoted a paper 2 months ago

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

Paper • 2410.23287 • Published Oct 30, 2024 • 19

upvoted 7 papers 3 months ago

Can MLLMs Understand the Deep Implication Behind Chinese Images?

Paper • 2410.13854 • Published Oct 17, 2024 • 10

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Paper • 2410.13830 • Published Oct 17, 2024 • 24

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Paper • 2410.13863 • Published Oct 17, 2024 • 37

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 66

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 49

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 34

Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published Oct 9, 2024 • 70

upvoted a paper 7 months ago

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

Paper • 2406.07502 • Published Jun 11, 2024 • 1