File size: 5,104 Bytes
83bdb58
6af516a
 
 
 
 
 
 
83bdb58
6af516a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
language:
- zh
- en
tags:
- llama2
- llama2-base
- llama2-base-7B
---
# 7B Chinese Chatbot trained based on LLama2-base 7B (Pure LoRA Training) 

## Introduction

在完成了[Llama2-chat 7B Chinese](https://huggingface.co/RicardoLee/Llama2-chat-Chinese-50W) 和 [Llama2-chat 13B Chinese](https://huggingface.co/RicardoLee/Llama2-chat-13B-Chinese-50W) 的训练后,我非常好奇能否直接基于Llama2-base 系列直接进行SFT训练。这也是本模型仓库的初衷。

终于,在[RicardoLee/Llama2-base-7B-Chinese-50W-pre\_release](https://huggingface.co/RicardoLee/Llama2-base-7B-Chinese-50W-pre_release),[RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA](https://huggingface.co/RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA) 之后,我成功探索出了能稳定训练LoRA的参数,并最终完成了50W 数据的LoRA 训练。

训练数据使用[BELLE](https://huggingface.co/BelleGroup)项目中采样的50万SFT数据进行SFT训练。

After finishing the training of [Llama2-chat 7B Chinese](https://huggingface.co/RicardoLee/Llama2-chat-Chinese-50W) and [Llama2-chat 13B Chinese](https://huggingface.co/RicardoLee/Llama2-chat-13B-Chinese-50W), I am deeply intrigued by the possibility of conducting SFT (Style-Fine-Tuning) training directly based on the Llama2-base series. This is the fundamental purpose of this model repository.

Finally, after [RicardoLee/Llama2-base-7B-Chinese-50W-pre\_release](https://huggingface.co/RicardoLee/Llama2-base-7B-Chinese-50W-pre_release),[RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA](https://huggingface.co/RicardoLee/Llama2-base-7B-Chinese-50W-Full2LoRA), I did find the right hyperparams to do the LoRA training stabelly based on Llama2-base 7B model. For more details please refer to the Train Detail section.

The training data is sampled from [BELLE](https://huggingface.co/BelleGroup) project, which consists of 500,000 SFT samples.

## Train Detail

一些训练上的细节:

1. 训练框架:该模型使用了修改过的[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)项目进行训练。
2. Tokenizer:该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
3. 训练参数:**该模型训练使用的超参数为:LoRA rank: 64, LR: 4e-4, Warmup ratio:  0.001.**
4. 训练资源:8卡V100。21小时
5. 训练起始的loss:9.1402
6. 训练终止的loss:1.4104

Some details in training:

1. Trianing Framework: This model is trained on modified [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) Framework.
2. Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.
3. Training Parameters: **The hyperparams are: LoRA rank: 64, LR: 4e-4, Warmup ratio: 0.001.**
4. Training Resource: 8\*V100, 21 hours.
5. Initial Loss: 9.1402
6. Train Loss: 1.4104

## Inference

该模型依然采用stanford alpaca 模版。因此在测试时且别忘记添加开场白。开场白如下:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\n${Your Content}\n\n### Response:\n\n"

对于带上文的对话,开场白如下:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\nHuman:${Previous Human Content}\nAssistant:${Previous Assistance Content}\nHuman:${Your Question}\n\n### Response:\n\n"

This model still using the Stanford Alpaca template. Therefore, don't forget to add prologue template. The prologue template is:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\n${Your Content}\n\n### Response:\n\n"

For dialogue with context, the prelogue template is:

"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n\nHuman:${Previous Human Content}\nAssistant:${Previous Machine Content}\nHuman:${Your Question}\n\n### Response:\n\n"

## Licence

本仓库的模型依照 Apache-2.0 协议开源,模型的权重的使用则需要遵循LLama2[MODEL LICENCE](LICENSE)。

This repository's models are open-sourced under the Apache-2.0 license, and their weight usage must adhere to LLama2 [MODEL LICENCE](LICENSE) license.

## Future Work

将会在近期逐步放出

1. 更大SFT数据规模训练下的模型。
2. 13B及以下的LLama2 同LLama2-chat的模型,以供大家对比。

I will release the following models:

1. Models trained on larger data scale.
2. Models trained on LLama2 and LLama2-chat (under the 13B, since I only have V100), for comparison.