OpenNLPLab commited on
Commit
d95909f
1 Parent(s): 8bce1ba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +194 -0
README.md CHANGED
@@ -1,3 +1,197 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - ' TransNormerLLM'
9
  ---
10
+
11
+ <div align="center">
12
+ <h1>
13
+ TransNormerLLM2 -- A Faster and Better LLM
14
+ </h1>
15
+ </div>
16
+
17
+ <p align="center">
18
+ 💻 <a href="https://github.com/OpenNLPLab/TransnormerLLM" target="_blank">GitHub </a> • 💬 <a href="https://discord.gg/MYQh6BWN" target="_blank">Discord</a> • 💬 <a href="https://github.com/OpenNLPLab/TransnormerLLM/blob/main/images/contact_me_qr.png" target="_blank">Wechat</a>
19
+ </p>
20
+
21
+ # Table of Contents
22
+
23
+ - [Table of Contents](#table-of-contents)
24
+ - [Introduction](#introduction)
25
+ - [Diff of TransNormerLLM2](#diff-of-transnormerllm2)
26
+ - [Released Weights](#released-weights)
27
+ - [Benchmark Results](#benchmark-results)
28
+ - [Inference and Deployment](#inference-and-deployment)
29
+ - [Dependency Installation](#dependency-installation)
30
+ - [Notice](#notice)
31
+ - [Inference](#inference)
32
+ - [Fine-tuning the Model](#fine-tuning-the-model)
33
+ - [Dependency Installation](#dependency-installation-1)
34
+ - [Training](#training)
35
+ - [Community and Ecosystem](#community-and-ecosystem)
36
+ - [Disclaimer, License and Citation](#disclaimer-license-and-citation)
37
+ - [Disclaimer](#disclaimer)
38
+ - [License](#license)
39
+ - [Acknowledgments](#acknowledgments)
40
+ - [Citation](#citation)
41
+
42
+ # Introduction
43
+
44
+ This official repo introduces the TransNormerLLM model, featuring its open-source weights. Additionally, it provides codes for Supervised Fine-tuning (SFT) and inference.
45
+
46
+ [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
47
+
48
+ - **TransNormerLLM1** is released in Nov 2023, featuring three versions with **385M**, **1B**, and **7B** parameters, trained on **1.4 trillion** tokens.
49
+ - The **latest update** transitions from TransNormerLLM1 to **TransNormerLLM2**, offering three updated versions with **1B**, **3B**, and **7B** parameters, trained on **0.3 trillion** tokens.
50
+ - All versions are available as open-source under the Apache-2.0 license.
51
+
52
+ ## Diff of TransNormerLLM2
53
+ - **TransNormerLLM1** incorporates Simple GLU in its channel mixer, GLA in the token mixer, and SRMSNorm for normalization. In this model, the channel and token mixers function sequentially in a pipeline arrangement.
54
+ - **TransNormerLLM2** also utilizes Simple GLU in the channel mixer, GLA in the token mixer, and SRMSNorm for normalization. However, in this version, the channel and token mixers operate concurrently, in parallel.
55
+
56
+
57
+
58
+ # Released Weights
59
+
60
+ The specific released versions and download links are shown as below:
61
+
62
+ | param | token | Base Models |
63
+ | :-------: | :---: | :------------------------------------------------------------------------------------: |
64
+ | v1-385M | 1400B | 🤗 [TransNormerLLM-385M](https://huggingface.co/OpenNLPLab/TransNormerLLM-385M) |
65
+ | v1-1B | 1400B | 🤗 [TransNormerLLM-1B](https://huggingface.co/OpenNLPLab/TransNormerLLM-1B) |
66
+ | v1-7B | 1400B | 🤗 [TransNormerLLM-7B](https://huggingface.co/OpenNLPLab/TransNormerLLM-7B) |
67
+ | **v2-1B** | 300B | 🤗 [TransNormerLLM2-1B-300B](https://huggingface.co/OpenNLPLab/TransNormerLLM2-1B-300B) |
68
+ | **v2-3B** | 300B | 🤗 [TransNormerLLM2-3B-300B](https://huggingface.co/OpenNLPLab/TransNormerLLM2-3B-300B) |
69
+ | **v2-7B** | 300B | 🤗 [TransNormerLLM2-7B-300B](https://huggingface.co/OpenNLPLab/TransNormerLLM2-7B-300B) |
70
+
71
+ # Benchmark Results
72
+
73
+ TransNormerLLM are evaluated on Commonsense Reasoning tasks and Multiple-Choice questions. For comparison, a range of open-source models are chosen for comparison, encompassing both Transformer-based and non-Transformer-based architectures. The evaluations of all models are conducted using the official settings and the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework.
74
+
75
+ | Model | P | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | MMLU | CMMLU | C-Eval |
76
+ | ---------------------- | --- | ---- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ |
77
+ | GPT-Neo | 1.3 | 0.3 | 61.99 | 71.11 | 48.93 | 54.93 | 56.19 | 25.85 | 33.60 | 24.82 | 26.03 | 23.94 |
78
+ | OPT | 1.3 | 0.3 | 57.77 | 71.71 | 53.70 | 59.35 | 57.24 | 29.69 | 33.20 | 24.96 | 24.97 | 25.32 |
79
+ | Pythia | 1.4 | 0.3 | 60.73 | 70.67 | 47.18 | 53.51 | 56.99 | 26.88 | 31.40 | 26.55 | 25.13 | 24.25 |
80
+ | BLOOM | 1.1 | 0.35 | 59.08 | 67.14 | 42.98 | 54.93 | 51.47 | 25.68 | 29.40 | 27.30 | 25.09 | 26.50 |
81
+ | RWKV | 1.5 | - | - | 72.36 | 52.48 | 54.62 | 60.48 | 29.44 | 34.00 | 25.77 | - | - |
82
+ | Falcon | 1.0 | 0.35 | 61.38 | 75.14 | 61.50 | 60.30 | 63.38 | 32.17 | 35.60 | 25.28 | 24.88 | 25.66 |
83
+ | **TransNormerLLM-1B** | 1.0 | 1.2 | 63.27 | 72.09 | 56.49 | 60.38 | 63.68 | 35.24 | 36.60 | 27.10 | 25.88 | 26.01 |
84
+ | **TransNormerLLM2-1B** | 1.0 | 0.3 | 59.45 | 69.70 | 45.96 | 52.49 | 54.29 | 25.60 | 33.00 | 26.10 | 24.97 | 26.30 |
85
+
86
+
87
+ > **P**: parameter size (billion). **T**: tokens (trillion). **BoolQ**: acc. **PIQA**: acc. **HellaSwag**: acc_norm. **WinoGrande**: acc. **ARC-easy**: acc. **ARC-challenge**: acc_norm. **OpenBookQA**: acc_norm. **MMLU**: 5-shot acc. **CMMLU**: 5-shot acc. **C-Eval**: 5-shot acc.
88
+
89
+ # Inference and Deployment
90
+
91
+ ## Dependency Installation
92
+
93
+
94
+ **📝Note** Please configure the following environment before using the model:
95
+
96
+ ```shell
97
+ pip install triton==2.0.0
98
+ pip install einops
99
+ ```
100
+
101
+ ### Notice
102
+ If you experience errors associated with Triton, it is advisable to disable Triton.
103
+ ```
104
+ export use_triton=False
105
+ ```
106
+
107
+
108
+ ## Inference
109
+
110
+ ```python
111
+ >>> from transformers import AutoModelForCausalLM, AutoTokenizer
112
+ >>> tokenizer = AutoTokenizer.from_pretrained("OpenNLPLab/TransNormerLLM2-1B-300B", trust_remote_code=True)
113
+ >>> model = AutoModelForCausalLM.from_pretrained("TransNormerLLM2-1B-300B", device_map="auto", trust_remote_code=True)
114
+ >>> inputs = tokenizer('今天是美好的一天', return_tensors='pt')
115
+ >>> pred = model.generate(**inputs, max_new_tokens=2048, repetition_penalty=1.0)
116
+ >>> print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
117
+ ```
118
+
119
+
120
+ # Fine-tuning the Model
121
+
122
+ ## Dependency Installation
123
+
124
+ ```shell
125
+ git clone https://github.com/OpenNLPLab/TransNormerLLM.git
126
+ cd TransNormerLLM/fine-tune
127
+ pip install -r requirements.txt
128
+ ```
129
+ - To use lightweight fine-tuning methods like LoRA, you must additionally install [peft](https://github.com/huggingface/peft).
130
+
131
+ ## Training
132
+
133
+ Below, we provide an example of fine-tuning the TransNormerLLM-1B on a single machine with ZeRO-3.
134
+
135
+ Training Data: `alpaca_data.json`. This sample data was drawn from [alpaca_data.json](https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json), consisting of a selection of 52,002 entries, and has been reformatted. The main purpose is to demonstrate how to SFT our model, and effectiveness is not guaranteed.
136
+
137
+ ```shell
138
+ torchrun \
139
+ --nproc_per_node=8 \
140
+ train.py \
141
+ --model_name_or_path OpenNLPLab/TransNormerLLM-1B \
142
+ --data_path ./alpaca_data.json \
143
+ --output_dir output \
144
+ --num_train_epochs 1 \
145
+ --per_device_train_batch_size 2 \
146
+ --per_device_eval_batch_size 1 \
147
+ --gradient_accumulation_steps 1 \
148
+ --bf16 true \
149
+ --adam_beta1 0.9 \
150
+ --adam_beta2 0.95 \
151
+ --evaluation_strategy "no" \
152
+ --save_strategy "steps" \
153
+ --save_steps 5000 \
154
+ --save_total_limit 30 \
155
+ --learning_rate 1e-4 \
156
+ --weight_decay 0.1 \
157
+ --warmup_ratio 0.1 \
158
+ --lr_scheduler_type "cosine" \
159
+ --deepspeed 'configs/zero3.json' \
160
+ --logging_steps 1 \
161
+ --dataloader_num_workers 24 \
162
+ --ddp_find_unused_parameters false \
163
+ --tf32 true \
164
+ ```
165
+
166
+ # Community and Ecosystem
167
+
168
+ **📢📢📢 We will continuously update the support for TransNormerLLM from the community and ecosystem here 😀😀😀**
169
+ - [nanoTransnormer](https://github.com/Doraemonzzz/nanoTransNormer)
170
+
171
+ # Disclaimer, License and Citation
172
+
173
+ ## Disclaimer
174
+ Our team has not created any applications using TransNormerLLM models for any platform including iOS, Android, and web. We urge users not to use these models for illegal activities or anything that could harm national or social security. We also advise against using these models for online services that haven't passed security reviews and legal procedures. We hope everyone will follow these guidelines to ensure technology develops in a safe and lawful way.
175
+
176
+ We've tried hard to make sure the data in our model training is compliant, but because the model and data are complex, there might still be unexpected issues. If any problems occur from using TransNormerLLM open-source models, like data security issues, public opinion risks, or problems caused by misuse or improper use of the model, we will not be responsible.
177
+
178
+ ## License
179
+ The community usage of TransNormerLLM model requires adherence to [Apache 2.0](https://github.com/OpenNLPLab/TransNormerLLM/blob/main/LICENSE) and [Community License for TransNormerLLM Model](https://huggingface.co/OpenNLPLab/TransNormerLLM-1B/blob/main/TransNormerLLM模型社区许可协议.pdf). The TransNormerLLM model supports commercial use. If you plan to use the TransNormerLLM model or its derivatives for commercial purposes, please ensure that you have submit the application materials required by the TransNormerLLM Model Community License Agreement via the following contact email: opennlplab@gmail.com.
180
+
181
+ ## Acknowledgments
182
+ Our project is developed based on the following open source projects:
183
+ - [Baichuan](https://github.com/baichuan-inc/Baichuan-7B) for the tokenizer.
184
+ - [metaseq](https://github.com/facebookresearch/metaseq) for training.
185
+ - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for evaluation.
186
+
187
+
188
+ ## Citation
189
+ If you wish to cite our work, please use the following reference:
190
+ ```
191
+ @article{qin2023scaling,
192
+ title={Scaling transnormer to 175 billion parameters},
193
+ author={Qin, Zhen and Li, Dong and Sun, Weigao and Sun, Weixuan and Shen, Xuyang and Han, Xiaodong and Wei, Yunshen and Lv, Baohong and Yuan, Fei and Luo, Xiao and others},
194
+ journal={arXiv preprint arXiv:2307.14995},
195
+ year={2023}
196
+ }
197
+ ```