RicardoLee
/

Llama2-chat-Chinese-50W

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

RicardoLee commited on Jul 21, 2023

Commit

b64445e

•

1 Parent(s): 5808953

README rectify

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -23,6 +23,8 @@ The training data is sampled from [BELLE](https://huggingface.co/BelleGroup) pro
 ## Train Detail
 1. 训练框架：该模型使用了修改过的[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)项目进行训练。
 2. Tokenizer：该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
 3. 训练参数：由于模型需要resize embedding，多出来的embedding等于随即初始化，因此训练前期deepspeed及其容易因“OVERFLOW”而开始reduce loss scale。频繁reduce 后会直接导致scale过小溢出，从而导致训练崩溃。此时不应降低学习率，warmup 等超参，而是应该放大到Pretrain 规模。如此才能让随即初始化的embedding快速走上正轨。
@@ -30,6 +32,7 @@ The training data is sampled from [BELLE](https://huggingface.co/BelleGroup) pro
 5. 训练起始的loss：8.7072
 6. 训练终止的loss：1.5674
 1. Trianing Framework: This model is trained on modified [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) Framework.
 2. Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.

 ## Train Detail
+一些训练上的细节：
 1. 训练框架：该模型使用了修改过的[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)项目进行训练。
 2. Tokenizer：该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
 3. 训练参数：由于模型需要resize embedding，多出来的embedding等于随即初始化，因此训练前期deepspeed及其容易因“OVERFLOW”而开始reduce loss scale。频繁reduce 后会直接导致scale过小溢出，从而导致训练崩溃。此时不应降低学习率，warmup 等超参，而是应该放大到Pretrain 规模。如此才能让随即初始化的embedding快速走上正轨。
 5. 训练起始的loss：8.7072
 6. 训练终止的loss：1.5674
+Some details in training:
 1. Trianing Framework: This model is trained on modified [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) Framework.
 2. Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.