Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# LongVA
|
2 |
+
<p align="center">
|
3 |
+
<img src="vision_niah/niah_output/LongVA-7B/heatmap.png" width="800">
|
4 |
+
</p>
|
5 |
+
|
6 |
+
<p align="center">
|
7 |
+
π <a href="https://lmms-lab.github.io/posts/longva/" target="_blank">Blog</a> | π <a href="https://arxiv.org/abs/2406.16852" target="_blank">Paper</a> | π€ <a href="https://huggingface.co/collections/lmms-lab/longva-667538e09329dbc7ea498057" target="_blank">Hugging Face</a> | π₯ <a href="https://longva-demo.lmms-lab.com/" target="_blank">Demo</a>
|
8 |
+
</p>
|
9 |
+
|
10 |
+
Long context capability can **zero-shot transfer** from language to vision.
|
11 |
+
|
12 |
+
LongVA can process **2000** frames or over **200K** visual tokens. It achieves **state-of-the-art** performance on Video-MME among 7B models.
|
13 |
+
|
14 |
+
|
15 |
+
|