RicardoLee
commited on
Commit
•
b64445e
1
Parent(s):
5808953
README rectify
Browse files
README.md
CHANGED
@@ -23,6 +23,8 @@ The training data is sampled from [BELLE](https://huggingface.co/BelleGroup) pro
|
|
23 |
|
24 |
## Train Detail
|
25 |
|
|
|
|
|
26 |
1. 训练框架:该模型使用了修改过的[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)项目进行训练。
|
27 |
2. Tokenizer:该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
|
28 |
3. 训练参数:由于模型需要resize embedding,多出来的embedding等于随即初始化,因此训练前期deepspeed及其容易因“OVERFLOW”而开始reduce loss scale。频繁reduce 后会直接导致scale过小溢出,从而导致训练崩溃。此时不应降低学习率,warmup 等超参,而是应该放大到Pretrain 规模。如此才能让随即初始化的embedding快速走上正轨。
|
@@ -30,6 +32,7 @@ The training data is sampled from [BELLE](https://huggingface.co/BelleGroup) pro
|
|
30 |
5. 训练起始的loss:8.7072
|
31 |
6. 训练终止的loss:1.5674
|
32 |
|
|
|
33 |
|
34 |
1. Trianing Framework: This model is trained on modified [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) Framework.
|
35 |
2. Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.
|
|
|
23 |
|
24 |
## Train Detail
|
25 |
|
26 |
+
一些训练上的细节:
|
27 |
+
|
28 |
1. 训练框架:该模型使用了修改过的[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)项目进行训练。
|
29 |
2. Tokenizer:该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
|
30 |
3. 训练参数:由于模型需要resize embedding,多出来的embedding等于随即初始化,因此训练前期deepspeed及其容易因“OVERFLOW”而开始reduce loss scale。频繁reduce 后会直接导致scale过小溢出,从而导致训练崩溃。此时不应降低学习率,warmup 等超参,而是应该放大到Pretrain 规模。如此才能让随即初始化的embedding快速走上正轨。
|
|
|
32 |
5. 训练起始的loss:8.7072
|
33 |
6. 训练终止的loss:1.5674
|
34 |
|
35 |
+
Some details in training:
|
36 |
|
37 |
1. Trianing Framework: This model is trained on modified [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) Framework.
|
38 |
2. Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.
|