Update README.md
Browse files
README.md
CHANGED
@@ -95,11 +95,11 @@ We did not use the `input` format in the Alpaca format for simplicity.
|
|
95 |
## Models
|
96 |
|
97 |
### Models with supervised fine-tuning
|
98 |
-
| Model | Size | Context | Train | Link
|
99 |
-
|
100 |
-
| LongAlpaca-7B | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-7B)
|
101 |
-
| LongAlpaca-13B | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-13B)
|
102 |
-
| LongAlpaca-70B | 70B | 32768 | LoRA+ | [
|
103 |
|
104 |
|
105 |
### Models with context extension via fully fine-tuning
|
@@ -135,6 +135,9 @@ We use LLaMA2 models as the pre-trained weights and fine-tune them to long conte
|
|
135 |
| [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
|
136 |
|[Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
|
137 |
| [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) |
|
|
|
|
|
|
|
138 |
|
139 |
This project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Polyglot-ko-12.8B](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) and other variants.
|
140 |
|
@@ -179,12 +182,12 @@ cd path_to_saving_checkpoints && python zero_to_fp32.py . pytorch_model.bin
|
|
179 |
### Supervised Fine-tuning
|
180 |
```
|
181 |
torchrun --nproc_per_node=8 supervised-fine-tune.py \
|
182 |
-
--model_name_or_path
|
183 |
--bf16 True \
|
184 |
--output_dir path_to_saving_checkpoints \
|
185 |
--model_max_length 32768 \
|
186 |
--use_flash_attn True \
|
187 |
-
--data_path
|
188 |
--low_rank_training True \
|
189 |
--num_train_epochs 3 \
|
190 |
--per_device_train_batch_size 1 \
|
@@ -202,8 +205,8 @@ torchrun --nproc_per_node=8 supervised-fine-tune.py \
|
|
202 |
--deepspeed "ds_configs/stage2.json" \
|
203 |
--tf32 True
|
204 |
```
|
205 |
-
-
|
206 |
-
-
|
207 |
|
208 |
|
209 |
### Get trainable weights in low-rank training
|
|
|
95 |
## Models
|
96 |
|
97 |
### Models with supervised fine-tuning
|
98 |
+
| Model | Size | Context | Train | Link |
|
99 |
+
|:---------------|------|---------|---------|-----------------------------------------------------------------------------------------------------------------------|
|
100 |
+
| LongAlpaca-7B | 7B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-7B) |
|
101 |
+
| LongAlpaca-13B | 13B | 32768 | Full FT | [Model](https://huggingface.co/Yukang/LongAlpaca-13B) |
|
102 |
+
| LongAlpaca-70B | 70B | 32768 | LoRA+ | [Model](https://huggingface.co/Yukang/LongAlpaca-70B-lora) |
|
103 |
|
104 |
|
105 |
### Models with context extension via fully fine-tuning
|
|
|
135 |
| [Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) |
|
136 |
|[Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf) |
|
137 |
| [Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf) |
|
138 |
+
| [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
|
139 |
+
| [Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) |
|
140 |
+
| [Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) |
|
141 |
|
142 |
This project also supports GPTNeoX models as the base model architecture. Some candidate pre-trained weights may include [GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b), [Polyglot-ko-12.8B](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) and other variants.
|
143 |
|
|
|
182 |
### Supervised Fine-tuning
|
183 |
```
|
184 |
torchrun --nproc_per_node=8 supervised-fine-tune.py \
|
185 |
+
--model_name_or_path path_to_Llama2_chat_models \
|
186 |
--bf16 True \
|
187 |
--output_dir path_to_saving_checkpoints \
|
188 |
--model_max_length 32768 \
|
189 |
--use_flash_attn True \
|
190 |
+
--data_path LongAlpaca-12k.json \
|
191 |
--low_rank_training True \
|
192 |
--num_train_epochs 3 \
|
193 |
--per_device_train_batch_size 1 \
|
|
|
205 |
--deepspeed "ds_configs/stage2.json" \
|
206 |
--tf32 True
|
207 |
```
|
208 |
+
- There is no need to make supervised fine-tuning upon the fine-tuned context extended models. It is all right to directly use base model as Llama2-chat models, as the amount of long instruction following data is enough for SFT.
|
209 |
+
- Our long instruction following data can be found in [LongAlpaca-12k.json](https://huggingface.co/datasets/Yukang/LongAlpaca-12k).
|
210 |
|
211 |
|
212 |
### Get trainable weights in low-rank training
|