Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
|
|
36 |
- Note: In this stage, we load the pretrained weights of InternViT-6B-224px and interpolate its position embedding to the size corresponding to 448 x 448 pixels. Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
|
37 |
- SFT Stage
|
38 |
- Learnable Component: MLP + LLM
|
39 |
-
- Data: A comprehensive collection of open-source SFT datasets, along with their Chinese translation versions, totaling approximately
|
40 |
|
41 |
|
42 |
## Model Usage
|
|
|
36 |
- Note: In this stage, we load the pretrained weights of InternViT-6B-224px and interpolate its position embedding to the size corresponding to 448 x 448 pixels. Moreover, in order to reduce the number of visual tokens, we use a pixel shuffle to reduce 1024 tokens to 256 tokens.
|
37 |
- SFT Stage
|
38 |
- Learnable Component: MLP + LLM
|
39 |
+
- Data: A comprehensive collection of open-source SFT datasets, along with their Chinese translation versions, totaling approximately 6M samples.
|
40 |
|
41 |
|
42 |
## Model Usage
|