MulaBR
/

Mula-4x160-v0.1

@@ -186,7 +186,7 @@ model-index:
 Mula is a series of Sparse Mixture of Experts (SMoE) language models, all trained natively in Brazilian Portuguese, designed to help democratize LLMs for low-resource languages.
-Mula-4x160-v0.1 is our first experiment on pre-training a SMoE, using the [Pt-Corpus-Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) dataset. It has 4 experts per layer and activates 2 for each token.
 Future versions of Mula will be trained on an extensively larger Brazilian Portuguese dataset.
@@ -198,7 +198,7 @@ Future versions of Mula will be trained on an extensively larger Brazilian Portu
 - **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
 - **Language:** Portuguese
 - **Training time**: ~ 30 hours
-- **Emissions:** 7.6 KgCO2 (Germany)
 - **Total energy consumption:** 15 kWh
 ## Intended Uses
@@ -278,12 +278,14 @@ Evaluations on benchmarks were performed using the [Language Model Evaluation Ha
 |                      | **ARC**   | **HellaSwag** | **MMLU**  | **TruthfulQA** |
 |----------------------|-----------|---------------|-----------|----------------|
 | **Mula-4x160-v0.1**  | 27.09     | 31.41         | 28.15     | 39.81          |
 Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).
-|                       | **ASSIN2 RTE** | **ASSIN2 STS** | **BLUEX** | **ENEM** | **FAQUAD NLI** | **HateBR** | **OAB Exams** | **TweetSentBR** |
-|-----------------------|----------------|----------------|-----------|----------|----------------|------------|---------------|-----------------|
-| **Mula-4x160-v0.1**   | 33.57          | 11.35          | 25.17     | 21.34    | 43.97          | 41.50      | 25.06         | 11.24           |
 ## Cite as 🤗

 Mula is a series of Sparse Mixture of Experts (SMoE) language models, all trained natively in Brazilian Portuguese, designed to help democratize LLMs for low-resource languages.
+Mula-4x160-v0.1 is one of our first experiments on pre-training a SMoE, using the [Pt-Corpus-Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) dataset. It has 4 experts per layer and activates 2 for each token.
 Future versions of Mula will be trained on an extensively larger Brazilian Portuguese dataset.
 - **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
 - **Language:** Portuguese
 - **Training time**: ~ 30 hours
+- **Emissions:** 7.6 KgCO2eq (Germany)
 - **Total energy consumption:** 15 kWh
 ## Intended Uses
 |                      | **ARC**   | **HellaSwag** | **MMLU**  | **TruthfulQA** |
 |----------------------|-----------|---------------|-----------|----------------|
 | **Mula-4x160-v0.1**  | 27.09     | 31.41         | 28.15     | 39.81          |
+| **Mula-8x160-v0.1**  | 26.15     | 33.06         | 28.14     | 41.69          |
 Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).
+|                       | **ASSIN2 RTE** | **ASSIN2 STS** | **BLUEX** | **ENEM** | **FAQUAD NLI** | **HateBR** | **PT Hate Speech** | **OAB Exams** | **TweetSentBR** |
+|-----------------------|----------------|----------------|-----------|----------|----------------|------------|--------------------|---------------|-----------------|
+| **Mula-4x160-v0.1**   | 33.57          | 11.35          | 25.17     | 21.34    | 43.97          | 41.50      | 22.99              | 25.06         | 11.24           |
+| **Mula-8x160-v0.1**   | 33.51          | 0              | 20.17     | 19.94    | 43.97          | 33.33      | 42.69              | 24.37         | 24.60           |
 ## Cite as 🤗