StarCycle commited on
Commit
25b4203
1 Parent(s): fbd940c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -12,6 +12,8 @@ The total size of the model is around 2.2B, which is suitable for embedded appli
12
  ```
13
  git clone https://github.com/InternLM/xtuner
14
  pip install -e ./xtuner[deepspeed]
 
 
15
  ```
16
 
17
  ## Common Errors
@@ -80,8 +82,15 @@ Please check the final release version
80
  ## Cheers! Now train your own model!
81
  1. Alignment module pretraining
82
  ```
83
- NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_7b_dinov2_e1_gpu8_pretrain.py --deepspeed deepspeed_zero2
 
 
 
 
84
  ```
 
 
 
85
  The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
86
 
87
  This is my loss curve for llava-clip-internlm2-1_8b-pretrain-v1:
 
12
  ```
13
  git clone https://github.com/InternLM/xtuner
14
  pip install -e ./xtuner[deepspeed]
15
+ https://huggingface.co/StarCycle/llava-clip-internlm2-1_8b-pretrain-v1
16
+ cd ./llava-clip-internlm2-1_8b-pretrain-v1
17
  ```
18
 
19
  ## Common Errors
 
82
  ## Cheers! Now train your own model!
83
  1. Alignment module pretraining
84
  ```
85
+ # single GPU
86
+ xtuner train ./llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu1_pretrain.py --deepspeed deepspeed_zero2
87
+
88
+ # multiple GPU
89
+ NPROC_PER_NODE=8 xtuner train ./llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu1_pretrain.py --deepspeed deepspeed_zero2
90
  ```
91
+
92
+ #### Remember to change the batch size and gradient accumulation parameters. So your batch_size*gradient_accumulation is roughly equal to mine to reproduce the result.
93
+
94
  The checkpoint and tensorboard logs are saved by default in ./work_dirs/. I only train it for 1 epoch to be same as the original LLaVA paper. Some researches also report that training for multiple epochs will make the model overfit the training dataset and perform worse in other domains.
95
 
96
  This is my loss curve for llava-clip-internlm2-1_8b-pretrain-v1: