Visual Question Answering
Transformers
TensorBoard
Safetensors
internvl_chat
feature-extraction
custom_code
czczup commited on
Commit
88586aa
1 Parent(s): b78f696

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -35,7 +35,7 @@ For better training reproducibility, we follow the minimalist design and data ef
35
 
36
  Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1.2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon [ShareGPT-4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md#prepare-images) and additionally integrate [LLaVA-ZH](https://huggingface.co/datasets/openbmb/llava_zh), [DVQA](https://github.com/kushalkafle/DVQA_dataset), [ChartQA](https://github.com/vis-nlp/ChartQA), [AI2D](https://allenai.org/data/diagrams), [DocVQA](https://www.docvqa.org/datasets), [GeoQA+](https://github.com/SCNU203/GeoQA-Plus), and [SynthDoG-EN](https://huggingface.co/datasets/naver-clova-ix/synthdog-en). Most of the data remains consistent with LLaVA-NeXT.
37
 
38
- For more details about data preparation, please see [here](./internvl_chat#prepare-training-datasets).
39
 
40
  ### Performance
41
 
@@ -57,9 +57,9 @@ For more details about data preparation, please see [here](./internvl_chat#prepa
57
 
58
  ### Training (SFT)
59
 
60
- We provide [slurm scripts](./internvl_chat/shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune.sh) for multi-node multi-GPU training. You can use either 32 or 64 GPUs to train this model. If you use 64 GPUs, training will take approximately 18 hours.
61
 
62
- For more details about training, please see [here](./internvl_chat#start-training).
63
 
64
  The hyperparameters used for finetuning are listed in the following table.
65
 
 
35
 
36
  Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1.2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon [ShareGPT-4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md#prepare-images) and additionally integrate [LLaVA-ZH](https://huggingface.co/datasets/openbmb/llava_zh), [DVQA](https://github.com/kushalkafle/DVQA_dataset), [ChartQA](https://github.com/vis-nlp/ChartQA), [AI2D](https://allenai.org/data/diagrams), [DocVQA](https://www.docvqa.org/datasets), [GeoQA+](https://github.com/SCNU203/GeoQA-Plus), and [SynthDoG-EN](https://huggingface.co/datasets/naver-clova-ix/synthdog-en). Most of the data remains consistent with LLaVA-NeXT.
37
 
38
+ For more details about data preparation, please see [here](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat#prepare-training-datasets).
39
 
40
  ### Performance
41
 
 
57
 
58
  ### Training (SFT)
59
 
60
+ We provide [slurm scripts](https://github.com/OpenGVLab/InternVL/tree/main//internvl_chat/shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune.sh) for multi-node multi-GPU training. You can use either 32 or 64 GPUs to train this model. If you use 64 GPUs, training will take approximately 18 hours.
61
 
62
+ For more details about training, please see [here](https://github.com/OpenGVLab/InternVL/tree/main//internvl_chat#start-training).
63
 
64
  The hyperparameters used for finetuning are listed in the following table.
65