nicholasKluge
commited on
Commit
•
b25b568
1
Parent(s):
11fe9b3
Update README.md
Browse files
README.md
CHANGED
@@ -186,7 +186,7 @@ model-index:
|
|
186 |
|
187 |
Mula is a series of Sparse Mixture of Experts (SMoE) language models, all trained natively in Brazilian Portuguese, designed to help democratize LLMs for low-resource languages.
|
188 |
|
189 |
-
Mula-4x160-v0.1 is our first
|
190 |
|
191 |
Future versions of Mula will be trained on an extensively larger Brazilian Portuguese dataset.
|
192 |
|
@@ -198,7 +198,7 @@ Future versions of Mula will be trained on an extensively larger Brazilian Portu
|
|
198 |
- **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
|
199 |
- **Language:** Portuguese
|
200 |
- **Training time**: ~ 30 hours
|
201 |
-
- **Emissions:** 7.6
|
202 |
- **Total energy consumption:** 15 kWh
|
203 |
|
204 |
## Intended Uses
|
@@ -278,12 +278,14 @@ Evaluations on benchmarks were performed using the [Language Model Evaluation Ha
|
|
278 |
| | **ARC** | **HellaSwag** | **MMLU** | **TruthfulQA** |
|
279 |
|----------------------|-----------|---------------|-----------|----------------|
|
280 |
| **Mula-4x160-v0.1** | 27.09 | 31.41 | 28.15 | 39.81 |
|
|
|
281 |
|
282 |
Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).
|
283 |
|
284 |
-
| | **ASSIN2 RTE** | **ASSIN2 STS** | **BLUEX** | **ENEM** | **FAQUAD NLI** | **HateBR** | **OAB Exams** | **TweetSentBR** |
|
285 |
-
|
286 |
-
| **Mula-4x160-v0.1** | 33.57 | 11.35 | 25.17 | 21.34 | 43.97 | 41.50 | 25.06 | 11.24 |
|
|
|
287 |
|
288 |
## Cite as 🤗
|
289 |
|
|
|
186 |
|
187 |
Mula is a series of Sparse Mixture of Experts (SMoE) language models, all trained natively in Brazilian Portuguese, designed to help democratize LLMs for low-resource languages.
|
188 |
|
189 |
+
Mula-4x160-v0.1 is one of our first experiments on pre-training a SMoE, using the [Pt-Corpus-Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) dataset. It has 4 experts per layer and activates 2 for each token.
|
190 |
|
191 |
Future versions of Mula will be trained on an extensively larger Brazilian Portuguese dataset.
|
192 |
|
|
|
198 |
- **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
|
199 |
- **Language:** Portuguese
|
200 |
- **Training time**: ~ 30 hours
|
201 |
+
- **Emissions:** 7.6 KgCO2eq (Germany)
|
202 |
- **Total energy consumption:** 15 kWh
|
203 |
|
204 |
## Intended Uses
|
|
|
278 |
| | **ARC** | **HellaSwag** | **MMLU** | **TruthfulQA** |
|
279 |
|----------------------|-----------|---------------|-----------|----------------|
|
280 |
| **Mula-4x160-v0.1** | 27.09 | 31.41 | 28.15 | 39.81 |
|
281 |
+
| **Mula-8x160-v0.1** | 26.15 | 33.06 | 28.14 | 41.69 |
|
282 |
|
283 |
Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).
|
284 |
|
285 |
+
| | **ASSIN2 RTE** | **ASSIN2 STS** | **BLUEX** | **ENEM** | **FAQUAD NLI** | **HateBR** | **PT Hate Speech** | **OAB Exams** | **TweetSentBR** |
|
286 |
+
|-----------------------|----------------|----------------|-----------|----------|----------------|------------|--------------------|---------------|-----------------|
|
287 |
+
| **Mula-4x160-v0.1** | 33.57 | 11.35 | 25.17 | 21.34 | 43.97 | 41.50 | 22.99 | 25.06 | 11.24 |
|
288 |
+
| **Mula-8x160-v0.1** | 33.51 | 0 | 20.17 | 19.94 | 43.97 | 33.33 | 42.69 | 24.37 | 24.60 |
|
289 |
|
290 |
## Cite as 🤗
|
291 |
|