20 18 17

Qinghong (Kevin) Lin PRO

KevinQHLin

http://qhlin.me/

AI & ML interests

Vision-Language Model, Video Understanding, Human-AI Interaction

Recent Activity

updated a dataset 6 days ago

KevinQHLin/showui_traj

published a dataset 6 days ago

KevinQHLin/showui_traj

upvoted a paper 10 days ago

WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation

View all activity

Organizations

KevinQHLin's activity

upvoted 2 papers 10 days ago

WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation

Paper • 2502.08047 • Published 11 days ago • 25

TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation

Paper • 2502.07870 • Published 11 days ago • 42

upvoted an article 13 days ago

Article

Welcome to Inference Providers on the Hub 🔥

26 days ago

• 382

upvoted a paper about 2 months ago

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 37

upvoted a collection about 2 months ago

AGUVIS: Unified Pure Vision GUI Agents

Collection

https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 5

upvoted 2 papers 3 months ago

Factorized Visual Tokenization and Generation

Paper • 2411.16681 • Published Nov 25, 2024 • 19

ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 82

upvoted 2 collections 3 months ago

GUI Models

Collection

9 items • Updated 1 day ago • 3

Research on GUI Models

Collection

18 items • Updated 1 day ago • 3

upvoted 2 papers 3 months ago

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 80

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Paper • 2411.10323 • Published Nov 15, 2024 • 32

upvoted a paper 4 months ago

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7, 2024 • 70

upvoted a paper 6 months ago

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51

upvoted a paper 8 months ago

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9

upvoted a paper about 1 year ago

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17

upvoted 3 papers over 1 year ago