|
--- |
|
license: cc |
|
datasets: |
|
- adam89/TinyStoriesChinese |
|
language: |
|
- zh |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
widget: |
|
- text: "从前有个小姑娘从来不洗脸" |
|
- text: "从前有个胖鹦鹉,胖得飞不动。" |
|
--- |
|
### Model Card: TinyStoriesChinese-110M |
|
|
|
**Overview:** |
|
|
|
![alt text](README.files/79e6f31072d75ef82135302dd88859a.png) |
|
|
|
TinyStoriesChinese-110M is a charmingly small, toy-like language model adept at generating short, straightforward stories in Chinese. This Small Language Model (SLM) is designed to process and create text with the simplicity and innocence of children’s tales. Despite its small size, TinyStoriesChinese-110M can generate consistent stories with robust and cute grammar, showing an emerging understanding of basic concepts such as personal hygiene and illness. |
|
|
|
Inspired by the TinyStories research, which explores the effectiveness of small language models with simplified training material, TinyStoriesChinese-110M focuses on a very narrow task. The model uses a synthetic dataset composed of stories that even a three-year-old could understand. This approach highlights the potential of smaller models to produce coherent, consistent text without the need for extensive computational resources. |
|
|
|
**Model Details:** |
|
- **Parameter Count:** 110M. |
|
- **Architecture:** Standard llama2 format Transformer with 12 layers, 12 attention heads, and a hidden size of 768. The model utilizes a 1024 context window with Relative Positional Encodings (RoPE) and a vocabulary size of 5,000. |
|
- **Dataset:** [TinyStoriesChinese](https://huggingface.co/datasets/adam89/TinyStoriesChinese/). |
|
|
|
**Training:** |
|
For detailed training procedures and configurations, please refer to [this GitHub repository](https://github.com/jia-zhuang/chinese-llama2.c). |
|
- **Hardware:** Trained on an NVIDIA RTX 2080 Super with 8 GB RAM—a modest gaming rig. |
|
- **Duration:** 87 hours (just over 3.5 days), covering 20k iterations and processing 2G tokens. |
|
- **Optimizer:** AdamW, with a learning rate (lr) of 5e-4, weight decay of 0.1, and gradient clipping at 1.0. The model underwent 1000 warm-up iterations without any dropout. |
|
- **Dropout:** no |
|
- **Batch Size:** 4, configured to fit within the 8GB RAM of the 2080; gradient accumulation steps set at 128, achieving an effective 524,288 tokens per iteration as suggested by the Chinchilla paper ([Chinchilla study](https://arxiv.org/abs/2203.15556)). |
|
- **Training Iterations:** 20k, including a warm-up phase of 1k steps. |
|
- **Training Loss:** 0.9138. |
|
|
|
|
|
**Example Output:** |
|
> 从前有个小姑娘从来不洗脸<span style="color:blue">。她的妈妈总是告诉她要洗脸,但她从来不听。一天,妈妈说:“不洗脸会生病。”她不听,继续玩。第二天,感觉不舒服,妈妈带她看医生,需要洗脸。她不愿意,但终于洗了。洗完后,感觉好多了,感谢妈妈。从那天起,她每天洗脸,再没生病。</span> |
|
|
|
**Example Usage** |
|
Below is a brief example of how to generate text using this model: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
generator = pipeline('text-generation', model='adam89/TinyStoriesChinese-110M') |
|
story_prompt = "从前有个小姑娘从来不洗脸" |
|
generated_story = generator(story_prompt, max_length=256) |
|
|
|
print(generated_story[0]['generated_text']) |
|
``` |
|
|
|
**Summary:** |
|
TinyStoriesChinese-110M is an excellent educational tool for machine learning beginners. Its straightforward and economical nature makes it perfect for those looking to dip their toes into the world of NLP model training without the need for extensive resources. |
|
|
|
TinyStoriesChinese-110M demonstrates how small language models can contribute to meaningful AI development with minimal time and hardware investment, making advanced NLP capabilities more accessible and lowering the entry barriers for a broader audience. Focused on simple, narrowly defined tasks, this model can generate fluent and logical text, displaying emerging logic and common sense abilities. However, the consistency and depth of these capabilities vary, indicating areas for potential enhancement. This model serves as an excellent base for hobbyists interested in exploring and expanding the boundaries of what small language models can achieve, providing a foundation for further experimentation and learning in the field of natural language processing. |