--- library_name: transformers tags: - indonesia license: mit language: - id inference: true --- Document Title

How small can language models be?

Sasando

Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture.

❕Go straight to the gradio demo❕

This repo contains the 25M version.

### 🎻 Welcome! Sasando-1 is a tiny, highly experimental Indonesian text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset. ### ✨ Specs - Comes with 7M and 25M parameters - Based on Phi-3 architecture - Embedding vocab 4096 - Trained on ~257M tokens * 4 epoch ### 🔭 Out-of-Scope Use This is a research preview base model. It is not intruction-tuned and has minimal safety curation. It is not intended for commercial or practical applications. You are also not allowed to use this model without having fun. ### Acknowledgments - **Developed by:** Afrizal Hasbi Azizy - **License:** MIT