OpenNLPLab commited on
Commit
1d2dc7d
1 Parent(s): 64300ed

Update 15B readme

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - ' TransNormerLLM'
9
  ---
10
+
11
+ <div align="center">
12
+ <h1>
13
+ TransNormerLLM3 -- A Faster and Better LLM
14
+ </h1>
15
+ </div>
16
+
17
+ # Introduction
18
+
19
+ This official repository unveils the TransNormerLLM3 model along with its open-source weights for every 50 billion tokens processed during pre-training.
20
+
21
+ [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
22
+
23
+
24
+ # TransNormerLLM3
25
+ - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.
26
+ - Titoken tokenizer is used with a total **vocabulary size** of about **100,000**.
27
+ - It incorporates **Simple GLU** for its channel mixer, **GLA** in the token mixer, and **SRMSNorm** for normalization.
28
+ - In terms of position encoding, the first layer employs **LRPE with exponential decay**, whereas the subsequent layers continue with **exponential decay encoding**.
29
+
30
+ ### Pre-training Logbook
31
+ * Realtime Track: https://api.wandb.ai/links/opennlplab/kip314lq
32
+ * Join to dicussion: [discord](https://discord.gg/MYQh6BWN) <<<>>> [wechat group](https://github.com/OpenNLPLab/TransnormerLLM/blob/main/images/contact_me_qr.png)
33
+ > startup: [WeChat - 预训练启航](https://mp.weixin.qq.com/s/YjUY-uy89WkF75_-rBTuKw) <<<>>> [Twitter - Pre-training Commences ](https://twitter.com/opennlplab/status/1739568669502611825) <<<>>> [YouTube Recording](https://t.co/wk7svS4o5r) <<<>>> [bilibili 回放](https://www.bilibili.com/video/BV11j411J7Dy)
34
+ > first week review: [WeChat - 第一周概览](https://mp.weixin.qq.com/s/zwGnZZI3itNPoxzzXkuU2w) <<<>>> [Twitter - First Week Review](https://twitter.com/opennlplab/status/1742187694078501038)
35
+
36
+
37
+ # Released Weights
38
+
39
+ | param | token | Hugging Face | Model Scope | Wisemodel |
40
+ | :-----: | :---: | :----------: | :---------: | :-------: |
41
+ | **15B** | 50B | 🤗 | 🤖 | 🐯 |
42
+
43
+ # Benchmark Results
44
+ The evaluations of all models are conducted using the official settings and the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework.
45
+
46
+ | Model | P | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | MMLU | C-Eval |
47
+ | ----------------------- | --- | ---- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ |
48
+ | **TransNormerLLM3-15B** | 15 | 0.05 | 62.08 | 72.52 | 55.55 | 57.14 | 62.12 | 31.14 | 32.40 | 27.50 | 26.18 |
49
+ | **TransNormerLLM3-15B** | 15 | 0.10 | | | | | | | | | |
50
+
51
+ > **P**: parameter size (billion). **T**: tokens (trillion). **BoolQ**: acc. **PIQA**: acc. **HellaSwag**: acc_norm. **WinoGrande**: acc. **ARC-easy**: acc. **ARC-challenge**: acc_norm. **OpenBookQA**: acc_norm. **MMLU**: 5-shot acc. **C-Eval**: 5-shot acc.
52
+
53
+
54
+ # Acknowledgments and Citation
55
+
56
+ ## Acknowledgments
57
+ Our project is developed based on the following open source projects:
58
+ - [tiktoken](https://github.com/openai/tiktoken) for the tokenizer.
59
+ - [metaseq](https://github.com/facebookresearch/metaseq) for training.
60
+ - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for evaluation.
61
+
62
+
63
+ ## Citation
64
+ If you wish to cite our work, please use the following reference:
65
+ ```
66
+ @article{qin2023scaling,
67
+ title={Scaling transnormer to 175 billion parameters},
68
+ author={Qin, Zhen and Li, Dong and Sun, Weigao and Sun, Weixuan and Shen, Xuyang and Han, Xiaodong and Wei, Yunshen and Lv, Baohong and Yuan, Fei and Luo, Xiao and others},
69
+ journal={arXiv preprint arXiv:2307.14995},
70
+ year={2023}
71
+ }
72
+ ```
73
+
74
+ <p align="center">
75
+ - OpenNLPLab @2024 -
76
+ </p>