fzmnm
/

TinyStoriesChinese-110M

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

fzmnm commited on 22 days ago

Commit

fb67790

•

1 Parent(s): 6befd01

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ Inspired by the TinyStories research, which explores the effectiveness of small
 For detailed training procedures and configurations, please refer to [this GitHub repository](https://github.com/jia-zhuang/chinese-llama2.c).
 - **Hardware:** Trained on an NVIDIA RTX 2080 Super with 8 GB RAM—a modest gaming rig.
 - **Duration:** 87 hours (just over 3.5 days), covering 20k iterations and processing 2G tokens.
-- **Optimizer:** AdamW, with a learning rate (lr) of 5e-4, weight decay of 0.1, and gradient clipping at 1.0. The model underwent 1000 warm-up iterations without any dropout.
 - **Dropout:** no
 - **Batch Size:** 4, configured to fit within the 8GB RAM of the 2080; gradient accumulation steps set at 128, achieving an effective 524,288 tokens per iteration as suggested by the Chinchilla paper ([Chinchilla study](https://arxiv.org/abs/2203.15556)).
 - **Training Iterations:** 20k, including a warm-up phase of 1k steps.

 For detailed training procedures and configurations, please refer to [this GitHub repository](https://github.com/jia-zhuang/chinese-llama2.c).
 - **Hardware:** Trained on an NVIDIA RTX 2080 Super with 8 GB RAM—a modest gaming rig.
 - **Duration:** 87 hours (just over 3.5 days), covering 20k iterations and processing 2G tokens.
+- **Optimizer:** AdamW, with a learning rate (lr) of 5e-4, with 1000 warm-up iterations.  gradient clipping at 1.0.
 - **Dropout:** no
 - **Batch Size:** 4, configured to fit within the 8GB RAM of the 2080; gradient accumulation steps set at 128, achieving an effective 524,288 tokens per iteration as suggested by the Chinchilla paper ([Chinchilla study](https://arxiv.org/abs/2203.15556)).
 - **Training Iterations:** 20k, including a warm-up phase of 1k steps.