Vision Language General
Zhang Yuanhan
ZhangYuanhan
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
5 days ago
Apollo: An Exploration of Video Understanding in Large Multimodal Models
upvoted
a
paper
24 days ago
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
new activity
about 1 month ago
lmms-lab/LLaVA-Video-178K:Query about how many frames are used to generate each caption?