CofeAI
/

Tele-FLM

Text Generation

Transformers

PyTorch

TeleFLM

custom_code

Model card Files Files and versions Community

horiz94 commited on Apr 26, 2024

Commit

1523c14

verified ·

1 Parent(s): a942a45

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -15,6 +15,9 @@ In addition to sharing the model weights, we provide the core designs, engineeri
 - **Language(s):** English; Chinese; Other languages
 - **License:** Apache 2.0
 ## Bias, Risks, and Limitations
@@ -68,7 +71,7 @@ We adopt the architecture of FLM-101B as the backbone for Tele-FLM, with several
 Consequently, Tele-FLM is largely compatible with Llama architecturally.
 To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
-In the pre-training stage, we employ μP for optimal hyperparameter search. The μP model (Tele-FLM_μP) is architecturally identical to Tele-FLM except for the model width(# attention heads).
 The architecture of Tele-FLM and Tele-FLM_μP is listed below.
 For more details of μP, please refer to our technical report and the original Tensor Program papers.
@@ -83,7 +86,7 @@ For more details of μP, please refer to our technical report and the original T
 ### Training Hyperparameters
 Due to the smaller size, Tele-FLM_μP allows for significantly more experimental runs within fixed time and resource constraints.
-We searched six hyperparameters for pretraining. All the hyperparameters are shown below.
 | Searched Hyperparameters                       ||| Non-Searched Hyperparameters    ||
@@ -146,9 +149,8 @@ The parallel training setup for Tele-FLM is configured as follows: tensor parall
 | Tele-FLM     | 71.13   | 65.48  | 66.98 | 66.25 | 92.57 | 64.38 |
-## Tech report
-For more detailed capabilities of Tele-FLM, see [Tele-FLM Technical Report](https://arxiv.org/pdf/2404.16645)
 If you find our work helpful, please consider citing it.
 ```
 @misc{li2024teleflm,

 - **Language(s):** English; Chinese; Other languages
 - **License:** Apache 2.0
+## Tech report
+[Tele-FLM Technical Report](https://arxiv.org/pdf/2404.16645)
 ## Bias, Risks, and Limitations
 Consequently, Tele-FLM is largely compatible with Llama architecturally.
 To maximize convenience for the community, we made minimal adjustments to Llama's code to adapt it to Tele-FLM and released it as open source.
+In the pre-training stage, we employ μP for optimal hyperparameter search. The μP model (Tele-FLM_μP) is architecturally identical to Tele-FLM except for the model width.
 The architecture of Tele-FLM and Tele-FLM_μP is listed below.
 For more details of μP, please refer to our technical report and the original Tensor Program papers.
 ### Training Hyperparameters
 Due to the smaller size, Tele-FLM_μP allows for significantly more experimental runs within fixed time and resource constraints.
+We searched seven hyperparameters for pretraining. All the hyperparameters are shown below.
 | Searched Hyperparameters                       ||| Non-Searched Hyperparameters    ||
 | Tele-FLM     | 71.13   | 65.48  | 66.98 | 66.25 | 92.57 | 64.38 |
+## Citation
 If you find our work helpful, please consider citing it.
 ```
 @misc{li2024teleflm,