You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ModelOne-Vision

πŸš€ Overview

ModelOne is a state-of-the-art multilingual model fine-tuned from the Microsoft Phi Vision architecture and weights. It is built for extracting structured information from a wide range of documents, images, and visual data, leveraging a specialized output_format token for flexible, structured output.

  • Base Model: Microsoft Phi Vision
  • Training Data: 7M+ samples across 70+ languages.
  • Output Flexibility: Supports free text, CSV, JSON, YAML, XML formats.

πŸ’‘ Join the Beta Program

Sign up for the Beta Program to finetune, evaluate and deploy this model on your own data and infrastructure.

🌍 Capabilities

ModelOne is ideal for:

  • Extracting structured data from scanned and photographed documents.
  • Interpreting complex tables, charts, and visual data representations.
  • Performing multilingual OCR across a broad set of languages.
  • Adapting outputs based on user-defined formats for seamless integration.

πŸ“š Training Data and Statistics

ModelOne was trained on a proprietary, high-quality dataset featuring a diverse range of documents and real-world images. The training process included over 7 million data points, with a strong focus on multilingual coverage.

πŸ—‚οΈ Dataset Composition

Document Type Percentage Details
Real-world Images 29% Photos, scans of receipts, forms, ID cards
Multipage Documents 49% Contracts, reports, books (up to 123 pages)
Single-page Documents 14% Invoices, certificates, single-page forms
Visual Representations 8% Tables, charts, graphs, diagrams

🌍 Language Coverage

Balanced representation across six main languages, with additional support for 64 more:

Language Percentage
English 14.27%
Spanish 14.50%
French 14.34%
German 14.06%
Italian 14.06%
Russian 14.58%
Other 14.19% (64 additional languages)

πŸ”‘ Key Insights

  • Balanced Language Representation: Each major language contributes approximately 14%, ensuring equitable performance.
  • Document Diversity: Includes a mix of single and multi-page documents, real-world images, and visual representations for comprehensive model training.
  • Robust Multilingual Capability: Coverage across 70+ languages makes it suitable for global applications needing extensive linguistic support.
Downloads last month
0
Inference API
Unable to determine this model's library. Check the docs .