horiz94 commited on
Commit
1523c14
1 Parent(s): a942a45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -15,6 +15,9 @@ In addition to sharing the model weights, we provide the core designs, engineeri
15
  - **Language(s):** English; Chinese; Other languages
16
  - **License:** Apache 2.0
17
 
 
 
 
18
 
19
 
20
  ## Bias, Risks, and Limitations
@@ -68,7 +71,7 @@ We adopt the architecture of FLM-101B as the backbone for Tele-FLM, with several
68
  Consequently, Tele-FLM is largely compatible with Llama architecturally.
69
  To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
70
 
71
- In the pre-training stage, we employ μP for optimal hyperparameter search. The μP model (Tele-FLM_μP) is architecturally identical to Tele-FLM except for the model width(# attention heads).
72
  The architecture of Tele-FLM and Tele-FLM_μP is listed below.
73
  For more details of μP, please refer to our technical report and the original Tensor Program papers.
74
 
@@ -83,7 +86,7 @@ For more details of μP, please refer to our technical report and the original T
83
  ### Training Hyperparameters
84
 
85
  Due to the smaller size, Tele-FLM_μP allows for significantly more experimental runs within fixed time and resource constraints.
86
- We searched six hyperparameters for pretraining. All the hyperparameters are shown below.
87
 
88
 
89
  | Searched Hyperparameters ||| Non-Searched Hyperparameters ||
@@ -146,9 +149,8 @@ The parallel training setup for Tele-FLM is configured as follows: tensor parall
146
  | Tele-FLM | 71.13 | 65.48 | 66.98 | 66.25 | 92.57 | 64.38 |
147
 
148
 
149
- ## Tech report
150
- For more detailed capabilities of Tele-FLM, see [Tele-FLM Technical Report](https://arxiv.org/pdf/2404.16645)
151
 
 
152
  If you find our work helpful, please consider citing it.
153
  ```
154
  @misc{li2024teleflm,
 
15
  - **Language(s):** English; Chinese; Other languages
16
  - **License:** Apache 2.0
17
 
18
+ ## Tech report
19
+
20
+ [Tele-FLM Technical Report](https://arxiv.org/pdf/2404.16645)
21
 
22
 
23
  ## Bias, Risks, and Limitations
 
71
  Consequently, Tele-FLM is largely compatible with Llama architecturally.
72
  To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
73
 
74
+ In the pre-training stage, we employ μP for optimal hyperparameter search. The μP model (Tele-FLM_μP) is architecturally identical to Tele-FLM except for the model width.
75
  The architecture of Tele-FLM and Tele-FLM_μP is listed below.
76
  For more details of μP, please refer to our technical report and the original Tensor Program papers.
77
 
 
86
  ### Training Hyperparameters
87
 
88
  Due to the smaller size, Tele-FLM_μP allows for significantly more experimental runs within fixed time and resource constraints.
89
+ We searched seven hyperparameters for pretraining. All the hyperparameters are shown below.
90
 
91
 
92
  | Searched Hyperparameters ||| Non-Searched Hyperparameters ||
 
149
  | Tele-FLM | 71.13 | 65.48 | 66.98 | 66.25 | 92.57 | 64.38 |
150
 
151
 
 
 
152
 
153
+ ## Citation
154
  If you find our work helpful, please consider citing it.
155
  ```
156
  @misc{li2024teleflm,