arxiv:2412.02611
Shijia Yang
shijiay
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 2 months ago
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
Audio-Visual Information?
authored
a paper
about 2 months ago
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
Audio-Visual Information?
commented on
a paper
4 months ago
Law of Vision Representation in MLLMs
Organizations
None yet
models
27
shijiay/llava_clip224_stage1
Image-Text-to-Text
•
Updated
•
14
shijiay/llava_clip224_stage2
Image-Text-to-Text
•
Updated
•
42
shijiay/llava_dinov2_stage2
Image-Text-to-Text
•
Updated
•
17
•
1
shijiay/llava_clip_stage1
Image-Text-to-Text
•
Updated
•
10
shijiay/llava_clip_stage2
Image-Text-to-Text
•
Updated
•
30
shijiay/llava_openclip_stage1
Image-Text-to-Text
•
Updated
•
3
shijiay/llava_openclip_stage2
Image-Text-to-Text
•
Updated
•
3
shijiay/llava_siglip_stage1
Image-Text-to-Text
•
Updated
•
10
shijiay/llava_siglip_stage2
Image-Text-to-Text
•
Updated
•
10
shijiay/llava_sdim_stage1
Image-Text-to-Text
•
Updated
•
3
datasets
None public yet