-
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 171 -
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Paper • 2501.05444 • Published -
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
Paper • 2502.14191 • Published • 7 -
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Paper • 2502.16614 • Published • 23
Henry Hengyuan Zhao
hhenryz
AI & ML interests
Multimodal Reasoning, Human-AI Interaction, GUI Automation
Recent Activity
commented on
a paper
1 day ago
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
upvoted
a
paper
2 days ago
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
updated
a model
3 days ago
ZechenBai/LOVA3-llava-v1.5-7b-gemini