alexchen4ai
commited on
Commit
•
84dd548
1
Parent(s):
509316c
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ tags:
|
|
12 |
|
13 |
Omnivision is a compact, sub-billion (968M) multimodal model for processing both visual and text inputs, optimized for edge devices. Improved on LLaVA's architecture, it features:
|
14 |
|
15 |
-
- **9x Token Reduction**: Reduces image tokens from 729 to 81
|
16 |
- **Trustworthy Result**: Reduces hallucinations using **DPO** training from trustworthy data.
|
17 |
|
18 |
**Quick Links:**
|
|
|
12 |
|
13 |
Omnivision is a compact, sub-billion (968M) multimodal model for processing both visual and text inputs, optimized for edge devices. Improved on LLaVA's architecture, it features:
|
14 |
|
15 |
+
- **9x Token Reduction**: Reduces image tokens from **729** to **81**, cutting latency and computational cost aggressively. Note that the computation of vision encoder and the projection part keep the same, but the computation of language model backbone is reduced due to 9X shorter image token span.
|
16 |
- **Trustworthy Result**: Reduces hallucinations using **DPO** training from trustworthy data.
|
17 |
|
18 |
**Quick Links:**
|