OpenNLPLab commited on
Commit
0e06ad0
β€’
1 Parent(s): 9600986

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -41
README.md CHANGED
@@ -20,11 +20,14 @@ This official repository unveils the TransNormerLLM3 model along with its open-s
20
 
21
  [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
22
 
 
23
 
24
  # TransNormerLLM3
25
  - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.
26
- - **TransNormerLLM3-15B** is purely intergrated with **[Lightning Attention-2](http://arxiv.org/abs/2401.04658)**, which can maintain a **stable TGS** during training of **unlimited sequence lengths**, up until encountering firm limitations like GPU memory constraints.
27
  - **Titoken** tokenizer is used with a total **vocabulary size** of about **100,000**.
 
 
28
  <p align="center">
29
  <img src="./images/TransNormer3.jpg" width="65%" />
30
  </p>
@@ -43,26 +46,35 @@ This official repository unveils the TransNormerLLM3 model along with its open-s
43
 
44
  # Released Weights
45
 
46
- | param | token | Hugging Face | Model Scope | Wisemodel |
47
- | :-----: | :---: | :----------------------------------------------------------------------------------------------------------------------: | :---------: | :-------: |
48
- | **15B** | 50B | πŸ€—[step13000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step13000-50Btokens) | πŸ€– | 🐯 |
49
- | **15B** | 100B | πŸ€—[step26000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step26000-100Btokens) | πŸ€– | 🐯 |
50
- | **15B** | 150B | πŸ€—[step39000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step39000-150Btokens) | πŸ€– | 🐯 |
51
- | **15B** | 200B | πŸ€—[step52000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step52000-200Btokens) | πŸ€– | 🐯 |
52
- | **15B** | 250B | πŸ€—[step65000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step65000-250Btokens) | πŸ€– | 🐯 |
53
- | **15B** | 300B | πŸ€—[step78000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step78000-300Btokens) | πŸ€– | 🐯 |
54
- | **15B** | 350B | πŸ€—[step92000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step92000-350Btokens) | πŸ€– | 🐯 |
55
- | **15B** | 400B | πŸ€—[step105000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step105000-400Btokens) | πŸ€– | 🐯 |
56
- | **15B** | 450B | πŸ€—[step118000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step118000-450Btokens) | πŸ€– | 🐯 |
57
- | **15B** | 500B | πŸ€—[step131000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step131000-500Btokens) | πŸ€– | 🐯 |
58
- | **15B** | 550B | πŸ€—[step144000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step144000-550Btokens) | πŸ€– | 🐯 |
59
- | **15B** | 600B | πŸ€—[step157000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step157000-600Btokens) | πŸ€– | 🐯 |
60
- | **15B** | 650B | πŸ€—[step170000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step170000-650Btokens) | πŸ€– | 🐯 |
61
- | **15B** | 700B | πŸ€—[step183000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step183000-700Btokens) | πŸ€– | 🐯 |
62
- | **15B** | 750B | πŸ€—[step195500](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step195500-750Btokens) | πŸ€– | 🐯 |
63
- | **15B** | 800B | πŸ€—[step209000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step209000-800Btokens) | πŸ€– | 🐯 |
64
- | **15B** | 850B | πŸ€—[step222000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step222000-850Btokens) | πŸ€– | 🐯 |
65
- | **15B** | 900B | πŸ€—[step235000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step235000-900Btokens) | πŸ€– | 🐯 |
 
 
 
 
 
 
 
 
 
66
 
67
 
68
 
@@ -76,26 +88,35 @@ model = AutoModelForCausalLM.from_pretrained("OpenNLPLab/TransNormerLLM3-15B-Int
76
  # Benchmark Results
77
  The evaluations of all models are conducted using the official settings and the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework.
78
 
79
- | Model | P | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | C-Eval | MMLU |
80
- | ----------------------- | --- | ---- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | ----- |
81
- | **TransNormerLLM3-15B** | 15 | 0.05 | 62.08 | 72.52 | 55.55 | 57.14 | 62.12 | 31.14 | 32.40 | 26.18 | 27.50 |
82
- | **TransNormerLLM3-15B** | 15 | 0.10 | 63.98 | 74.70 | 61.09 | 61.33 | 65.95 | 34.64 | 35.60 | 25.38 | 27.40 |
83
- | **TransNormerLLM3-15B** | 15 | 0.15 | 60.34 | 75.08 | 63.99 | 62.04 | 64.56 | 34.90 | 35.20 | 22.64 | 26.60 |
84
- | **TransNormerLLM3-15B** | 15 | 0.20 | 52.05 | 74.48 | 64.72 | 62.75 | 66.16 | 35.15 | 36.80 | 27.25 | 30.80 |
85
- | **TransNormerLLM3-15B** | 15 | 0.25 | 66.70 | 76.50 | 66.51 | 64.80 | 66.84 | 36.18 | 39.40 | 30.87 | 36.10 |
86
- | **TransNormerLLM3-15B** | 15 | 0.30 | 67.00 | 76.50 | 67.17 | 64.40 | 66.29 | 36.77 | 38.80 | 33.99 | 37.60 |
87
- | **TransNormerLLM3-15B** | 15 | 0.35 | 65.78 | 75.46 | 67.88 | 66.54 | 67.34 | 38.57 | 39.60 | 36.02 | 39.20 |
88
- | **TransNormerLLM3-15B** | 15 | 0.40 | 67.34 | 75.24 | 68.51 | 66.22 | 68.94 | 40.10 | 39.20 | 36.91 | 41.10 |
89
- | **TransNormerLLM3-15B** | 15 | 0.45 | 69.02 | 76.28 | 69.11 | 63.77 | 65.82 | 36.01 | 39.40 | 37.17 | 42.80 |
90
- | **TransNormerLLM3-15B** | 15 | 0.50 | 66.15 | 77.09 | 69.75 | 65.11 | 68.56 | 35.84 | 39.60 | 39.81 | 42.00 |
91
- | **TransNormerLLM3-15B** | 15 | 0.55 | 70.24 | 74.05 | 69.96 | 65.75 | 65.61 | 36.69 | 38.60 | 40.08 | 44.00 |
92
- | **TransNormerLLM3-15B** | 15 | 0.60 | 74.34 | 75.68 | 70.44 | 66.22 | 69.36 | 38.40 | 38.40 | 41.05 | 45.30 |
93
- | **TransNormerLLM3-15B** | 15 | 0.65 | 73.15 | 76.55 | 71.60 | 66.46 | 69.65 | 39.68 | 40.80 | 41.20 | 44.90 |
94
- | **TransNormerLLM3-15B** | 15 | 0.70 | 73.79 | 78.18 | 73.26 | 67.56 | 71.21 | 43.60 | 40.80 | 43.46 | 47.00 |
95
- | **TransNormerLLM3-15B** | 15 | 0.75 | 76.45 | 78.07 | 74.22 | 69.30 | 71.21 | 43.43 | 42.20 | 43.46 | 47.80 |
96
- | **TransNormerLLM3-15B** | 15 | 0.80 | 76.97 | 78.84 | 74.95 | 69.85 | 72.14 | 43.52 | 41.20 | 45.21 | 49.41 |
97
- | **TransNormerLLM3-15B** | 15 | 0.85 | 72.75 | 78.35 | 75.91 | 70.48 | 74.58 | 45.22 | 41.20 | 46.27 | 49.36 |
98
- | **TransNormerLLM3-15B** | 15 | 0.90 | 76.09 | 77.91 | 76.49 | 70.88 | 72.14 | 42.92 | 40.20 | 45.70 | 50.15 |
 
 
 
 
 
 
 
 
 
99
 
100
 
101
  > **P**: parameter size (billion). **T**: tokens (trillion). **BoolQ**: acc. **PIQA**: acc. **HellaSwag**: acc_norm. **WinoGrande**: acc. **ARC-easy**: acc. **ARC-challenge**: acc_norm. **OpenBookQA**: acc_norm. **MMLU**: 5-shot acc. **C-Eval**: 5-shot acc.
@@ -136,6 +157,14 @@ If you wish to cite our work, please use the following reference:
136
  archivePrefix={arXiv},
137
  primaryClass={cs.CL}
138
  }
 
 
 
 
 
 
 
 
139
  ```
140
 
141
  <p align="center">
 
20
 
21
  [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
22
 
23
+ > Update@Apr.7: We plan to increase the sequence length in pre-training stage to **10 million**: https://twitter.com/opennlplab/status/1776894730015789300
24
 
25
  # TransNormerLLM3
26
  - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.
27
+ - **TransNormerLLM3-15B** is purely intergrated with **[Lightning Attention-2](http://arxiv.org/abs/2401.04658)**, which can maintain a **stable TGS** during training of **unlimited sequence lengths**, up until encountering firm limitations like GPU memory constraints.
28
  - **Titoken** tokenizer is used with a total **vocabulary size** of about **100,000**.
29
+ - Our **training framework** has been enhanced with integration to **[LASP](https://arxiv.org/abs/2404.02882) (Linear Attention Sequence Parallelism)**, allowing for sequence parallelism within linear attention models.
30
+ - Our **training framework** now supprts **[CO2](https://arxiv.org/abs/2401.16265)**, which introduces **local updates** and **asynchronous communication** into distributed data parallel training, achieving **full overlap** of communication and computation.
31
  <p align="center">
32
  <img src="./images/TransNormer3.jpg" width="65%" />
33
  </p>
 
46
 
47
  # Released Weights
48
 
49
+ | param | token | Hugging Face | Model Scope | Wisemodel |
50
+ | :-----: | :---: | :-----------------------------------------------------------------------------------------------------------------------: | :---------: | :-------: |
51
+ | **15B** | 50B | πŸ€—[step13000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step13000-50Btokens) | πŸ€– | 🐯 |
52
+ | **15B** | 100B | πŸ€—[step26000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step26000-100Btokens) | πŸ€– | 🐯 |
53
+ | **15B** | 150B | πŸ€—[step39000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step39000-150Btokens) | πŸ€– | 🐯 |
54
+ | **15B** | 200B | πŸ€—[step52000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step52000-200Btokens) | πŸ€– | 🐯 |
55
+ | **15B** | 250B | πŸ€—[step65000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step65000-250Btokens) | πŸ€– | 🐯 |
56
+ | **15B** | 300B | πŸ€—[step78000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step78000-300Btokens) | πŸ€– | 🐯 |
57
+ | **15B** | 350B | πŸ€—[step92000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step92000-350Btokens) | πŸ€– | 🐯 |
58
+ | **15B** | 400B | πŸ€—[step105000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step105000-400Btokens) | πŸ€– | 🐯 |
59
+ | **15B** | 450B | πŸ€—[step118000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step118000-450Btokens) | πŸ€– | 🐯 |
60
+ | **15B** | 500B | πŸ€—[step131000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step131000-500Btokens) | πŸ€– | 🐯 |
61
+ | **15B** | 550B | πŸ€—[step144000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step144000-550Btokens) | πŸ€– | 🐯 |
62
+ | **15B** | 600B | πŸ€—[step157000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step157000-600Btokens) | πŸ€– | 🐯 |
63
+ | **15B** | 650B | πŸ€—[step170000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step170000-650Btokens) | πŸ€– | 🐯 |
64
+ | **15B** | 700B | πŸ€—[step183000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step183000-700Btokens) | πŸ€– | 🐯 |
65
+ | **15B** | 750B | πŸ€—[step195500](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step195500-750Btokens) | πŸ€– | 🐯 |
66
+ | **15B** | 800B | πŸ€—[step209000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step209000-800Btokens) | πŸ€– | 🐯 |
67
+ | **15B** | 850B | πŸ€—[step222000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step222000-850Btokens) | πŸ€– | 🐯 |
68
+ | **15B** | 900B | πŸ€—[step235000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step235000-900Btokens) | πŸ€– | 🐯 |
69
+ | **15B** | 950B | πŸ€—[step248000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step248000-950Btokens) | πŸ€– | 🐯 |
70
+ | **15B** | 1000B | πŸ€—[step261000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step261000-1000Btokens) | πŸ€– | 🐯 |
71
+ | **15B** | 1050B | πŸ€—[step274000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step274000-1050Btokens) | πŸ€– | 🐯 |
72
+ | **15B** | 1100B | πŸ€—[step287000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step287000-1100Btokens) | πŸ€– | 🐯 |
73
+ | **15B** | 1150B | πŸ€—[step300000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step300000-1150Btokens) | πŸ€– | 🐯 |
74
+ | **15B** | 1200B | πŸ€—[step313500](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step313500-1200Btokens) | πŸ€– | 🐯 |
75
+ | **15B** | 1250B | πŸ€—[step326000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step287000-1250Btokens) | πŸ€– | 🐯 |
76
+ | **15B** | 1300B | πŸ€—[step339500](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step326000-1300Btokens) | πŸ€– | 🐯 |
77
+ | **15B** | 1345B | πŸ€—[stage1](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/stage1-1345Btokens) | πŸ€– | 🐯 |
78
 
79
 
80
 
 
88
  # Benchmark Results
89
  The evaluations of all models are conducted using the official settings and the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework.
90
 
91
+ | Model | P | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | C-Eval | MMLU |
92
+ | ----------------------- | --- | ------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | ----- |
93
+ | **TransNormerLLM3-15B** | 15 | 0.05 | 62.08 | 72.52 | 55.55 | 57.14 | 62.12 | 31.14 | 32.40 | 26.18 | 27.50 |
94
+ | **TransNormerLLM3-15B** | 15 | 0.10 | 63.98 | 74.70 | 61.09 | 61.33 | 65.95 | 34.64 | 35.60 | 25.38 | 27.40 |
95
+ | **TransNormerLLM3-15B** | 15 | 0.15 | 60.34 | 75.08 | 63.99 | 62.04 | 64.56 | 34.90 | 35.20 | 22.64 | 26.60 |
96
+ | **TransNormerLLM3-15B** | 15 | 0.20 | 52.05 | 74.48 | 64.72 | 62.75 | 66.16 | 35.15 | 36.80 | 27.25 | 30.80 |
97
+ | **TransNormerLLM3-15B** | 15 | 0.25 | 66.70 | 76.50 | 66.51 | 64.80 | 66.84 | 36.18 | 39.40 | 30.87 | 36.10 |
98
+ | **TransNormerLLM3-15B** | 15 | 0.30 | 67.00 | 76.50 | 67.17 | 64.40 | 66.29 | 36.77 | 38.80 | 33.99 | 37.60 |
99
+ | **TransNormerLLM3-15B** | 15 | 0.35 | 65.78 | 75.46 | 67.88 | 66.54 | 67.34 | 38.57 | 39.60 | 36.02 | 39.20 |
100
+ | **TransNormerLLM3-15B** | 15 | 0.40 | 67.34 | 75.24 | 68.51 | 66.22 | 68.94 | 40.10 | 39.20 | 36.91 | 41.10 |
101
+ | **TransNormerLLM3-15B** | 15 | 0.45 | 69.02 | 76.28 | 69.11 | 63.77 | 65.82 | 36.01 | 39.40 | 37.17 | 42.80 |
102
+ | **TransNormerLLM3-15B** | 15 | 0.50 | 66.15 | 77.09 | 69.75 | 65.11 | 68.56 | 35.84 | 39.60 | 39.81 | 42.00 |
103
+ | **TransNormerLLM3-15B** | 15 | 0.55 | 70.24 | 74.05 | 69.96 | 65.75 | 65.61 | 36.69 | 38.60 | 40.08 | 44.00 |
104
+ | **TransNormerLLM3-15B** | 15 | 0.60 | 74.34 | 75.68 | 70.44 | 66.22 | 69.36 | 38.40 | 38.40 | 41.05 | 45.30 |
105
+ | **TransNormerLLM3-15B** | 15 | 0.65 | 73.15 | 76.55 | 71.60 | 66.46 | 69.65 | 39.68 | 40.80 | 41.20 | 44.90 |
106
+ | **TransNormerLLM3-15B** | 15 | 0.70 | 73.79 | 78.18 | 73.26 | 67.56 | 71.21 | 43.60 | 40.80 | 43.46 | 47.00 |
107
+ | **TransNormerLLM3-15B** | 15 | 0.75 | 76.45 | 78.07 | 74.22 | 69.30 | 71.21 | 43.43 | 42.20 | 43.46 | 47.80 |
108
+ | **TransNormerLLM3-15B** | 15 | 0.80 | 76.97 | 78.84 | 74.95 | 69.85 | 72.14 | 43.52 | 41.20 | 45.21 | 49.41 |
109
+ | **TransNormerLLM3-15B** | 15 | 0.85 | 72.75 | 78.35 | 75.91 | 70.48 | 74.58 | 45.22 | 41.20 | 46.27 | 49.36 |
110
+ | **TransNormerLLM3-15B** | 15 | 0.90 | 76.09 | 77.91 | 76.49 | 70.88 | 72.14 | 42.92 | 40.20 | 45.70 | 50.15 |
111
+ | **TransNormerLLM3-15B** | 15 | 0.95 | 74.28 | 78.24 | 76.63 | 72.22 | 74.12 | 44.11 | 42.40 | 46.25 | 51.43 |
112
+ | **TransNormerLLM3-15B** | 15 | 1.00 | 74.62 | 79.16 | 77.35 | 72.22 | 73.86 | 45.14 | 43.40 | 47.90 | 51.65 |
113
+ | **TransNormerLLM3-15B** | 15 | 1.05 | 76.36 | 78.94 | 77.15 | 71.35 | 74.66 | 44.45 | 42.80 | 45.87 | 52.28 |
114
+ | **TransNormerLLM3-15B** | 15 | 1.10 | 76.88 | 78.73 | 77.62 | 70.88 | 74.41 | 45.48 | 42.80 | 49.78 | 53.01 |
115
+ | **TransNormerLLM3-15B** | 15 | 1.15 | 72.87 | 79.43 | 78.12 | 72.85 | 74.75 | 46.16 | 43.20 | 49.80 | 53.04 |
116
+ | **TransNormerLLM3-15B** | 15 | 1.20 | 79.48 | 78.67 | 78.45 | 72.93 | 75.42 | 44.37 | 43.60 | 49.33 | 53.80 |
117
+ | **TransNormerLLM3-15B** | 15 | 1.25 | 79.17 | 79.16 | 78.81 | 72.93 | 75.13 | 45.99 | 43.60 | 50.44 | 54.19 |
118
+ | **TransNormerLLM3-15B** | 15 | 1.30 | 78.41 | 79.00 | 78.39 | 71.90 | 74.33 | 45.05 | 42.80 | 52.24 | 54.41 |
119
+ | **TransNormerLLM3-15B** | 15 | stage1 | 78.75 | 79.27 | 78.33 | 71.35 | 75.97 | 46.42 | 45.00 | 50.25 | 54.50 |
120
 
121
 
122
  > **P**: parameter size (billion). **T**: tokens (trillion). **BoolQ**: acc. **PIQA**: acc. **HellaSwag**: acc_norm. **WinoGrande**: acc. **ARC-easy**: acc. **ARC-challenge**: acc_norm. **OpenBookQA**: acc_norm. **MMLU**: 5-shot acc. **C-Eval**: 5-shot acc.
 
157
  archivePrefix={arXiv},
158
  primaryClass={cs.CL}
159
  }
160
+ @misc{sun2024linear,
161
+ title={Linear Attention Sequence Parallelism},
162
+ author={Weigao Sun and Zhen Qin and Dong Li and Xuyang Shen and Yu Qiao and Yiran Zhong},
163
+ year={2024},
164
+ eprint={2404.02882},
165
+ archivePrefix={arXiv},
166
+ primaryClass={cs.LG}
167
+ }
168
  ```
169
 
170
  <p align="center">