mfarre HF staff commited on
Commit
78b5928
1 Parent(s): 16aea32

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -7
README.md CHANGED
@@ -21,7 +21,7 @@ SmolVLM is a compact open multimodal model that accepts arbitrary sequences of i
21
  - **Model type:** Multi-modal model (image+text)
22
  - **Language(s) (NLP):** English
23
  - **License:** Apache 2.0
24
- - **Architecture:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see more details below)
25
 
26
  ## Resources
27
 
@@ -160,15 +160,19 @@ We release the SmolVLM checkpoints under the Apache 2.0 license.
160
 
161
  ### Training Data
162
 
163
- ![Data mixture](mixture_the_cauldron.png)
164
 
165
- The training data is: ![Training data](smolvlm-data.pdf)
166
 
167
 
168
- #### Speeds, Sizes, Times [optional]
169
-
170
- TODO
171
 
172
  ## Evaluation
173
 
174
- TODO
 
 
 
 
 
 
 
 
 
21
  - **Model type:** Multi-modal model (image+text)
22
  - **Language(s) (NLP):** English
23
  - **License:** Apache 2.0
24
+ - **Architecture:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
25
 
26
  ## Resources
27
 
 
160
 
161
  ### Training Data
162
 
163
+ The training data comes from [The Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and [Docmatix](https://huggingface.co/datasets/HuggingFaceM4/Docmatix) datasets, with emphasis on document understanding (25%) and image captioning (18%), while maintaining balanced coverage across other crucial capabilities like visual reasoning, chart comprehension, and general instruction following.<img src="https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct/resolve/main/mixture_the_cauldron.png" alt="Example Image" style="width:70%;" />
164
 
 
165
 
166
 
 
 
 
167
 
168
  ## Evaluation
169
 
170
+ | Model | MMMU (val) | MathVista (testmini) | MMStar (val) | DocVQA (test) | TextVQA (val) | Min GPU RAM required (GB) |
171
+ |-------------------|------------|----------------------|--------------|---------------|---------------|---------------------------|
172
+ | SmolVLM | 38.8 | 44.6 | 42.1 | 81.6 | 72.7 | 5.02 |
173
+ | Qwen-VL 2B | 41.1 | 47.8 | 47.5 | 90.1 | 79.7 | 13.70 |
174
+ | InternVL2 2B | 34.3 | 46.3 | 49.8 | 86.9 | 73.4 | 10.52 |
175
+ | PaliGemma 3B 448px| 34.9 | 28.7 | 48.3 | 32.2 | 56.0 | 6.72 |
176
+ | moondream2 | 32.4 | 24.3 | 40.3 | 70.5 | 65.2 | 3.87 |
177
+ | MiniCPM-V-2 | 38.2 | 39.8 | 39.1 | 71.9 | 74.1 | 7.88 |
178
+ | MM1.5 1B | 35.8 | 37.2 | 0.0 | 81.0 | 72.5 | NaN |