alexchen4ai commited on
Commit
84dd548
1 Parent(s): 509316c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -12,7 +12,7 @@ tags:
12
 
13
  Omnivision is a compact, sub-billion (968M) multimodal model for processing both visual and text inputs, optimized for edge devices. Improved on LLaVA's architecture, it features:
14
 
15
- - **9x Token Reduction**: Reduces image tokens from 729 to 81, cutting latency and computational cost.
16
  - **Trustworthy Result**: Reduces hallucinations using **DPO** training from trustworthy data.
17
 
18
  **Quick Links:**
 
12
 
13
  Omnivision is a compact, sub-billion (968M) multimodal model for processing both visual and text inputs, optimized for edge devices. Improved on LLaVA's architecture, it features:
14
 
15
+ - **9x Token Reduction**: Reduces image tokens from **729** to **81**, cutting latency and computational cost aggressively. Note that the computation of vision encoder and the projection part keep the same, but the computation of language model backbone is reduced due to 9X shorter image token span.
16
  - **Trustworthy Result**: Reduces hallucinations using **DPO** training from trustworthy data.
17
 
18
  **Quick Links:**