NexaAIDev
/

omnivision-968M

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

alexchen4ai commited on about 21 hours ago

Commit

e91dda6

•

1 Parent(s): 48fbc9a

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -10,10 +10,10 @@ tags:
 ## Introduction
-Omnivision is a compact, sub-billion (968M) multimodal model for processing both visual and text inputs, optimized for edge devices. Built on LLaVA's architecture, it features:
 - **9x Token Reduction**: Reduces image tokens from 729 to 81, cutting latency and computational cost.
-- **Minimal-Edit DPO**: Enhances response quality with minimal edits, preserving core model behavior.
 **Quick Links:**
 1. Interactive Demo in our [Hugging Face Space](https://huggingface.co/spaces/NexaAIDev/omnivlm-dpo-demo).

 ## Introduction
+Omnivision is a compact, sub-billion (968M) multimodal model for processing both visual and text inputs, optimized for edge devices. Improved on LLaVA's architecture, it features:
 - **9x Token Reduction**: Reduces image tokens from 729 to 81, cutting latency and computational cost.
+- **Trustworthy result**: Reduces hallucinations using **DPO** training from trustworthy data.
 **Quick Links:**
 1. Interactive Demo in our [Hugging Face Space](https://huggingface.co/spaces/NexaAIDev/omnivlm-dpo-demo).