Update README.md
Browse files
README.md
CHANGED
@@ -9,22 +9,24 @@ Aquila语言大模型在技术上继承了GPT-3、LLaMA等的架构设计优点
|
|
9 |
|
10 |
The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
|
11 |
|
12 |
-
AquilaChat-7B是在Aquila-7B模型的基础上,进行SFT微调后的支持中英双语的对话式语言模型。AquilaChat-7B
|
13 |
|
14 |
AquilaChat-7B is a conversational language model that supports Chinese-English dialogue. It is based on the Aquila-7B model and fine-tuned using SFT. AquilaChat-7B model was developed by Beijing Academy of Artificial Intelligence.
|
15 |
|
16 |
|
17 |
-
AquilaChat模型主要为了验证基础模型能力,您可以根据自己需要对模型进行使用,修改和商业化,但必须遵守所有国家的法律法规,并且对任何第三方使用者提供Aquila系列模型的来源以及Aquila
|
18 |
|
19 |
The AquilaChat model was primarily developed to verify the capabilities of the foundational model. You can use, modify, and commercialize the model according to your needs, but you must comply with all applicable laws and regulations in your country. Additionally, you must provide the source of the Aquila series models and a copy of the Aquila series model lincense to any third-party users.
|
20 |
|
21 |
## 模型细节/Model details
|
22 |
-
| Model |
|
23 |
-
| :---------------- | :------- | :-- |:-- |
|
24 |
-
| Aquila-7B |
|
25 |
-
|
|
26 |
-
| AquilaCode-7B-
|
27 |
-
|
|
|
|
|
|
28 |
|
29 |
|
30 |
我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们应用了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
|
@@ -42,9 +44,9 @@ We used different tokenizers to extract ten thousand data samples from English,
|
|
42 |
|
43 |
| 模型/Model | 词表大小/Vocab size | 说明/Note |英文平均tokens量/Avg tokens(English)| 中文平均tokens量/Avg tokens(Chinesse)|代码平均tokens量/Avg tokens(code) |
|
44 |
| ----- | ---- | ----- | ---- | ----- | ---- |
|
45 |
-
|
|
46 |
-
|
|
47 |
-
|
|
48 |
|
49 |
## 训练数据集/Training data
|
50 |
|
|
|
9 |
|
10 |
The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.
|
11 |
|
12 |
+
AquilaChat-7B是在Aquila-7B模型的基础上,进行SFT微调后的支持中英双语的对话式语言模型。AquilaChat-7B模型由智源研究院研发。
|
13 |
|
14 |
AquilaChat-7B is a conversational language model that supports Chinese-English dialogue. It is based on the Aquila-7B model and fine-tuned using SFT. AquilaChat-7B model was developed by Beijing Academy of Artificial Intelligence.
|
15 |
|
16 |
|
17 |
+
AquilaChat模型主要为了验证基础模型能力,您可以根据自己需要对模型进行使用,修改和商业化,但必须遵守所有国家的法律法规,并且对任何第三方使用者提供Aquila系列模型的来源以及Aquila系列模型协议的副本。
|
18 |
|
19 |
The AquilaChat model was primarily developed to verify the capabilities of the foundational model. You can use, modify, and commercialize the model according to your needs, but you must comply with all applicable laws and regulations in your country. Additionally, you must provide the source of the Aquila series models and a copy of the Aquila series model lincense to any third-party users.
|
20 |
|
21 |
## 模型细节/Model details
|
22 |
+
| 模型/Model | 状态/State | 能否商用/Commercial use? | 所用显卡/GPU |
|
23 |
+
| :---------------- | :------- | :-- |:-- |
|
24 |
+
| Aquila-7B | 已发布 | ✅ | Nvidia-A100 |
|
25 |
+
| AquilaChat-7B |已发布 | ✅ | Nvidia-A100 |
|
26 |
+
| AquilaCode-7B-NV |已发布 | ✅ | Nvidia-A100 |
|
27 |
+
| AquilaCode-7B-TS |已发布 | ✅ | Tianshu-BI-V100 |
|
28 |
+
| Aquila-33B | **敬请期待** | ✅ | Nvidia-A100 |
|
29 |
+
| AquilaChat-33B |**敬请期待** | ✅ | Nvidia-A100 |
|
30 |
|
31 |
|
32 |
我们使用了一系列更高效的底层算子来辅助模型训练,其中包括参考[flash-attention](https://github.com/HazyResearch/flash-attention)的方法并替换了一些中间计算,同时还使用了RMSNorm。在此基础上,我们应用了[BMtrain](https://github.com/OpenBMB/BMTrain)技术进行轻量化的并行训练,该技术采用了数据并行、ZeRO(零冗余优化器)、优化器卸载、检查点和操作融合、通信-计算重叠等方法来优化模型训练过程。
|
|
|
44 |
|
45 |
| 模型/Model | 词表大小/Vocab size | 说明/Note |英文平均tokens量/Avg tokens(English)| 中文平均tokens量/Avg tokens(Chinesse)|代码平均tokens量/Avg tokens(code) |
|
46 |
| ----- | ---- | ----- | ---- | ----- | ---- |
|
47 |
+
| GPT2 | 50527 | bpe|1717 | 1764|2323 |
|
48 |
+
| LLaMA | 32000 | sp(bpe)|1805| 1257|1970 |
|
49 |
+
| Aquila | 100000 | bpe|1575 | 477|1679 |
|
50 |
|
51 |
## 训练数据集/Training data
|
52 |
|