Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar PRO
orrzohar
AI & ML interests
Large Multi-Modal Models, Foundation Models, Video Understanding
Recent Activity
upvoted
a
paper
about 16 hours ago
NVILA: Efficient Frontier Visual Language Models
upvoted
a
paper
4 days ago
Are Your LLMs Capable of Stable Reasoning?
Organizations
Collections
2
interesting Video-LLMs
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 29 -
VILA: On Pre-training for Visual Language Models
Paper • 2312.07533 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51 -
Long Context Transfer from Language to Vision
Paper • 2406.16852 • Published • 32