Visual Question Answering
Transformers
TensorBoard
Safetensors
internvl_chat
feature-extraction
custom_code
czczup commited on
Commit
0f18ed3
1 Parent(s): c9c65b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -36,7 +36,7 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
36
  - Note: In this stage, we load the pretrained weights of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
37
  - SFT Stage
38
  - Learnable Component: ViT + MLP + LLM
39
- - Data: A simplified, fully open-source dataset, containing approximately 1 million entries.
40
 
41
 
42
  ## Model Usage
 
36
  - Note: In this stage, we load the pretrained weights of [InternViT-6B-448px-V1-2](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2). Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
37
  - SFT Stage
38
  - Learnable Component: ViT + MLP + LLM
39
+ - Data: A simplified, fully open-source dataset, containing approximately 1 million samples.
40
 
41
 
42
  ## Model Usage