OpenGVLab
/

InternVL-Chat-V1-2

Image-Text-to-Text

feature-extraction

Model card Files Files and versions Metrics Training metrics Community

czczup commited on Feb 13, 2024

Commit

88586aa

·

verified ·

1 Parent(s): b78f696

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ For better training reproducibility, we follow the minimalist design and data ef
 Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1.2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon [ShareGPT-4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md#prepare-images) and additionally integrate [LLaVA-ZH](https://huggingface.co/datasets/openbmb/llava_zh), [DVQA](https://github.com/kushalkafle/DVQA_dataset), [ChartQA](https://github.com/vis-nlp/ChartQA), [AI2D](https://allenai.org/data/diagrams), [DocVQA](https://www.docvqa.org/datasets), [GeoQA+](https://github.com/SCNU203/GeoQA-Plus), and [SynthDoG-EN](https://huggingface.co/datasets/naver-clova-ix/synthdog-en). Most of the data remains consistent with LLaVA-NeXT.
-For more details about data preparation, please see [here](./internvl_chat#prepare-training-datasets).
 ### Performance
@@ -57,9 +57,9 @@ For more details about data preparation, please see [here](./internvl_chat#prepa
 ### Training (SFT)
-We provide [slurm scripts](./internvl_chat/shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune.sh) for multi-node multi-GPU training. You can use either 32 or 64 GPUs to train this model. If you use 64 GPUs, training will take approximately 18 hours.
-For more details about training, please see [here](./internvl_chat#start-training).
 The hyperparameters used for finetuning are listed in the following table.

 Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1.2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon [ShareGPT-4V](https://github.com/InternLM/InternLM-XComposer/blob/main/projects/ShareGPT4V/docs/Data.md#prepare-images) and additionally integrate [LLaVA-ZH](https://huggingface.co/datasets/openbmb/llava_zh), [DVQA](https://github.com/kushalkafle/DVQA_dataset), [ChartQA](https://github.com/vis-nlp/ChartQA), [AI2D](https://allenai.org/data/diagrams), [DocVQA](https://www.docvqa.org/datasets), [GeoQA+](https://github.com/SCNU203/GeoQA-Plus), and [SynthDoG-EN](https://huggingface.co/datasets/naver-clova-ix/synthdog-en). Most of the data remains consistent with LLaVA-NeXT.
+For more details about data preparation, please see [here](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat#prepare-training-datasets).
 ### Performance
 ### Training (SFT)
+We provide [slurm scripts](https://github.com/OpenGVLab/InternVL/tree/main//internvl_chat/shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_finetune.sh) for multi-node multi-GPU training. You can use either 32 or 64 GPUs to train this model. If you use 64 GPUs, training will take approximately 18 hours.
+For more details about training, please see [here](https://github.com/OpenGVLab/InternVL/tree/main//internvl_chat#start-training).
 The hyperparameters used for finetuning are listed in the following table.