VITA-1.5 / README.md
nielsr's picture
nielsr HF staff
Add model card
34ce5f2 verified
|
raw
history blame
245 Bytes
metadata
pipeline_tag: video-text-to-text

This repository contains the model of the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

Code: https://github.com/VITA-MLLM/VITA