czczup commited on
Commit
162cb94
1 Parent(s): f564018

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -31,7 +31,7 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
31
 
32
  - **Training Strategy:**
33
  - Pretraining Stage
34
- - Learnable Component: InternViT-6B
35
  - Data: Trained on 72M samples, including COYO, LAION, CC12M, CC3M, SBU, Wukong, GRIT, Objects365, OpenImages, and OCR data.
36
  - Note: In this stage, we load the pretrained weights of InternViT-6B-224px and interpolate its position embedding to the size corresponding to 448 x 448 pixels. Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
37
  - SFT Stage
 
31
 
32
  - **Training Strategy:**
33
  - Pretraining Stage
34
+ - Learnable Component: InternViT-6B + MLP
35
  - Data: Trained on 72M samples, including COYO, LAION, CC12M, CC3M, SBU, Wukong, GRIT, Objects365, OpenImages, and OCR data.
36
  - Note: In this stage, we load the pretrained weights of InternViT-6B-224px and interpolate its position embedding to the size corresponding to 448 x 448 pixels. Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
37
  - SFT Stage