minor readme modification
Browse files- README.md +10 -1
- README_zh.md +10 -1
README.md
CHANGED
@@ -113,7 +113,7 @@ It uses a 3D(DP+TP+PP) parallelism approach and distributed optimizer.
|
|
113 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
114 |
Although we've made extensive efforts to thoroughly clean and filter the training corpus for the model, due to the open nature of the dataset, the model may still have picked up on some unsafe examples. Consequently, the model may still generate unexpected content, including but not limited to discrimination, bias, or offensive language. We would like to strongly advise users not to spread any unsafe content generated by the model. The project developers cannot be held responsible for any repercussions stemming from the dissemination of harmful information.
|
115 |
|
116 |
-
At the current stage of training, FLM-101B has a relatively low token count, leaving significant room for improvement in knowledge, especially in specialized domains. Additionally, the model's inference process is not yet optimized, leading to high resource usage and limited speed. We will soon introduce support for Flash Attention.
|
117 |
If you have suggestions for improvement in these areas or any other aspects, please feel free to open an issue on GitHub, and we will respond promptly. Thank you!
|
118 |
|
119 |
|
@@ -121,6 +121,15 @@ If you have suggestions for improvement in these areas or any other aspects, ple
|
|
121 |
|
122 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
123 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
|
126 |
## Contact
|
|
|
113 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
114 |
Although we've made extensive efforts to thoroughly clean and filter the training corpus for the model, due to the open nature of the dataset, the model may still have picked up on some unsafe examples. Consequently, the model may still generate unexpected content, including but not limited to discrimination, bias, or offensive language. We would like to strongly advise users not to spread any unsafe content generated by the model. The project developers cannot be held responsible for any repercussions stemming from the dissemination of harmful information.
|
115 |
|
116 |
+
At the current stage of training, FLM-101B has a relatively low token count, leaving significant room for improvement in knowledge, especially in specialized domains. Additionally, the model's inference process is not yet optimized, leading to high resource usage and limited speed. We will soon introduce support for Flash Attention in inference.
|
117 |
If you have suggestions for improvement in these areas or any other aspects, please feel free to open an issue on GitHub, and we will respond promptly. Thank you!
|
118 |
|
119 |
|
|
|
121 |
|
122 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
123 |
|
124 |
+
```
|
125 |
+
@article{flm-101b,
|
126 |
+
author = {Xiang Li and Yiqun Yao and Xin Jiang and Xuezhi Fang and Xuying Meng and
|
127 |
+
Siqi Fan and Peng Han and Jing Li and Li Du and Bowen Qin and Zheng Zhang and
|
128 |
+
Aixin Sun and Yequan Wang},
|
129 |
+
title = {FLM-101B: An Open LLM and How to Train It with \$100K Budget},
|
130 |
+
year = {2023}
|
131 |
+
}
|
132 |
+
```
|
133 |
|
134 |
|
135 |
## Contact
|
README_zh.md
CHANGED
@@ -109,13 +109,22 @@ FLM-101B的训练代码Megatron-FLM基于Megatron-LM框架修改,将在近期
|
|
109 |
|
110 |
尽管我们已经尽最大努力对模型训练语料进行了清洗过滤,但由于训练语料的开放性,模型仍有可能在一些不安全的语料上进行过学习。因此模型仍有可能生成不符合预期的文本,包括但不限于歧视、偏见、谩骂等。我们在此提醒模型使用者,请勿传播模型可能生成的不安全内容。由于传播不良信息导致的任何后果,本项目开发者不承担责任。
|
111 |
|
112 |
-
FLM-101B 现阶段训练的 token
|
113 |
如果您对这两方面,或其他方面有改进需求,欢迎在 github 提issue,我们会尽快响应。谢谢!
|
114 |
|
115 |
## 引用
|
116 |
|
117 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
118 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
119 |
|
120 |
|
121 |
## 联系我们
|
|
|
109 |
|
110 |
尽管我们已经尽最大努力对模型训练语料进行了清洗过滤,但由于训练语料的开放性,模型仍有可能在一些不安全的语料上进行过学习。因此模型仍有可能生成不符合预期的文本,包括但不限于歧视、偏见、谩骂等。我们在此提醒模型使用者,请勿传播模型可能生成的不安全内容。由于传播不良信息导致的任何后果,本项目开发者不承担责任。
|
111 |
|
112 |
+
FLM-101B 现阶段训练的 token 数比较少,在知识(特别是专业知识)方面有较大进步空间。另一方面,模型的推理目前没有进行优化因此推理资源占用较高,速度受限。为此,我们将很快在推理侧支持Flash Attention。
|
113 |
如果您对这两方面,或其他方面有改进需求,欢迎在 github 提issue,我们会尽快响应。谢谢!
|
114 |
|
115 |
## 引用
|
116 |
|
117 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
118 |
|
119 |
+
```
|
120 |
+
@article{flm-101b,
|
121 |
+
author = {Xiang Li and Yiqun Yao and Xin Jiang and Xuezhi Fang and Xuying Meng and
|
122 |
+
Siqi Fan and Peng Han and Jing Li and Li Du and Bowen Qin and Zheng Zhang and
|
123 |
+
Aixin Sun and Yequan Wang},
|
124 |
+
title = {FLM-101B: An Open LLM and How to Train It with \$100K Budget},
|
125 |
+
year = {2023}
|
126 |
+
}
|
127 |
+
```
|
128 |
|
129 |
|
130 |
## 联系我们
|