|
--- |
|
library_name: transformers |
|
tags: |
|
- indonesia |
|
license: mit |
|
language: |
|
- id |
|
inference: true |
|
--- |
|
<!DOCTYPE html> |
|
<html lang="en"> |
|
<head> |
|
<meta charset="UTF-8"> |
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"> |
|
<title>Document Title</title> |
|
<style> |
|
h1 { |
|
font-size: 36px; |
|
color: navy; |
|
font-family: 'Tahoma'; |
|
text-align: center; |
|
} |
|
</style> |
|
</head> |
|
<body> |
|
<h1>How small can language models be?</h1> |
|
</body> |
|
</html> |
|
|
|
<center> |
|
<img src="https://i.imgur.com/z9ey830.png" alt="Sasando" width="500" height="250"> |
|
<p><em>Sasando-1 is a tiny, highly experimental text generator built using the Phi-3 architecture.</em></p> |
|
<p><strong><a href="https://huggingface.co/spaces/afrizalha/Sasando-1" style="color: blue; font-family: Tahoma;">❕Go straight to the gradio demo❕</a></strong></p> |
|
<p><em style="color: black; font-weight: bold;">This repo contains the 25M version.</em></p> |
|
</center> |
|
## 🎻 Welcome! |
|
Sasando-1 is a tiny, highly experimental Indonesian text generator built using the Phi-3 architecture. It comes with two variations of microscopic sizes: 7M and 25M parameters. It is trained on a tightly-controlled Indo4B dataset filtered to only have 18000 unique words. The method is inspired by Microsoft's TinyStories paper which demonstrates that a tiny language model can produce fluent text when trained on tightly-controlled dataset. |
|
|
|
## 🇮🇩 Context |
|
Indonesia has +700 languages, and many of them are dying at an alarming rate. Language technologies like generative AI can play a massive role in language preservation. However, Indonesia has several contextual issues: |
|
|
|
- Many languages, including those with millions of speakers, have low-volume digital resources |
|
- Running large models can be costly, while Indonesia is a middle-income country with little funding |
|
|
|
Overcoming these challenges require developers to work with what little data and money that they have. Sasando-1 is a prototypical demonstration that thinly-available resources can potentially still be leveraged to develop generative models with cheap compute. |
|
|
|
## ✨ Specs |
|
- Comes with 7M and 25M parameters |
|
- Based on Phi-3 architecture |
|
- Embedding vocab 4096 |
|
- Trained on ~257M tokens * 4 epoch |
|
|
|
## 🔭 Out-of-Scope Use |
|
This is a research preview base model. It is not intruction-tuned and has minimal safety curation. It is not intended for commercial or practical applications. |
|
|
|
You are also not allowed to use this model without having fun. |
|
|
|
## Acknowledgments |
|
|
|
- **Developed by:** Afrizal Hasbi Azizy |
|
- **License:** MIT |
|
|
|
## Training log |
|
<center> |
|
<img src="https://imgur.com/32NFAKm.png" alt="Training log" width="500" height="250"> |
|
</center> |