Sasando-1-25M / README.md
afrizalha's picture
Update README.md
a79621e verified
|
raw
history blame
No virus
2.85 kB
metadata
library_name: transformers
tags:
  - indonesia
license: mit
language:
  - id
inference: true

How small can language models be?

Sasando

Sasando-1 is a tiny, highly experimental short-sequence text generator built using the Phi-3 architecture.

❕Go straight to the gradio demo❕

This repo contains the 25M version.

Research preview.

🎻 Welcome!

Sasando-1 is a tiny, highly experimental Indonesian text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset.

🇮🇩 Context

Indonesia has +700 languages, and many of them are dying at an alarming rate. Language technologies like generative AI can play a massive role in language preservation. However, Indonesia has several contextual issues:

  • Many languages, including those with millions of speakers, have low-volume digital resources
  • Running large models can be costly, while Indonesia is a middle-income country with little funding

Overcoming these challenges require developers to work with what little data and money that they have. Sasando-1 is a prototypical demonstration that thinly-available resources can potentially still be leveraged to develop generative models with cheap compute.

✨ Specs

  • Comes with 7M and 25M parameters
  • Based on Phi-3 architecture
  • Embedding vocab 4096
  • Trained on ~257M tokens * 4 epoch

🔭 Out-of-Scope Use

This is a research preview base model. It is not intruction-tuned and has minimal safety curation. It is not intended for commercial or practical applications.

You are also not allowed to use this model without having fun.

Acknowledgments

  • Developed by: Afrizal Hasbi Azizy
  • License: MIT

Training log

Training log