nicholasKluge commited on
Commit
b25b568
1 Parent(s): 11fe9b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -186,7 +186,7 @@ model-index:
186
 
187
  Mula is a series of Sparse Mixture of Experts (SMoE) language models, all trained natively in Brazilian Portuguese, designed to help democratize LLMs for low-resource languages.
188
 
189
- Mula-4x160-v0.1 is our first experiment on pre-training a SMoE, using the [Pt-Corpus-Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) dataset. It has 4 experts per layer and activates 2 for each token.
190
 
191
  Future versions of Mula will be trained on an extensively larger Brazilian Portuguese dataset.
192
 
@@ -198,7 +198,7 @@ Future versions of Mula will be trained on an extensively larger Brazilian Portu
198
  - **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
199
  - **Language:** Portuguese
200
  - **Training time**: ~ 30 hours
201
- - **Emissions:** 7.6 KgCO2 (Germany)
202
  - **Total energy consumption:** 15 kWh
203
 
204
  ## Intended Uses
@@ -278,12 +278,14 @@ Evaluations on benchmarks were performed using the [Language Model Evaluation Ha
278
  | | **ARC** | **HellaSwag** | **MMLU** | **TruthfulQA** |
279
  |----------------------|-----------|---------------|-----------|----------------|
280
  | **Mula-4x160-v0.1** | 27.09 | 31.41 | 28.15 | 39.81 |
 
281
 
282
  Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).
283
 
284
- | | **ASSIN2 RTE** | **ASSIN2 STS** | **BLUEX** | **ENEM** | **FAQUAD NLI** | **HateBR** | **OAB Exams** | **TweetSentBR** |
285
- |-----------------------|----------------|----------------|-----------|----------|----------------|------------|---------------|-----------------|
286
- | **Mula-4x160-v0.1** | 33.57 | 11.35 | 25.17 | 21.34 | 43.97 | 41.50 | 25.06 | 11.24 |
 
287
 
288
  ## Cite as 🤗
289
 
 
186
 
187
  Mula is a series of Sparse Mixture of Experts (SMoE) language models, all trained natively in Brazilian Portuguese, designed to help democratize LLMs for low-resource languages.
188
 
189
+ Mula-4x160-v0.1 is one of our first experiments on pre-training a SMoE, using the [Pt-Corpus-Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) dataset. It has 4 experts per layer and activates 2 for each token.
190
 
191
  Future versions of Mula will be trained on an extensively larger Brazilian Portuguese dataset.
192
 
 
198
  - **Dataset:** [Pt-Corpus Instruct](https://huggingface.co/datasets/nicholasKluge/Pt-Corpus-Instruct) (6.2B tokens)
199
  - **Language:** Portuguese
200
  - **Training time**: ~ 30 hours
201
+ - **Emissions:** 7.6 KgCO2eq (Germany)
202
  - **Total energy consumption:** 15 kWh
203
 
204
  ## Intended Uses
 
278
  | | **ARC** | **HellaSwag** | **MMLU** | **TruthfulQA** |
279
  |----------------------|-----------|---------------|-----------|----------------|
280
  | **Mula-4x160-v0.1** | 27.09 | 31.41 | 28.15 | 39.81 |
281
+ | **Mula-8x160-v0.1** | 26.15 | 33.06 | 28.14 | 41.69 |
282
 
283
  Evaluations on Brazilian Portuguese benchmarks were performed using a [Portuguese implementation of the EleutherAI LM Evaluation Harness](https://github.com/eduagarcia/lm-evaluation-harness-pt) (created by [Eduardo Garcia](https://github.com/eduagarcia/lm-evaluation-harness-pt)).
284
 
285
+ | | **ASSIN2 RTE** | **ASSIN2 STS** | **BLUEX** | **ENEM** | **FAQUAD NLI** | **HateBR** | **PT Hate Speech** | **OAB Exams** | **TweetSentBR** |
286
+ |-----------------------|----------------|----------------|-----------|----------|----------------|------------|--------------------|---------------|-----------------|
287
+ | **Mula-4x160-v0.1** | 33.57 | 11.35 | 25.17 | 21.34 | 43.97 | 41.50 | 22.99 | 25.06 | 11.24 |
288
+ | **Mula-8x160-v0.1** | 33.51 | 0 | 20.17 | 19.94 | 43.97 | 33.33 | 42.69 | 24.37 | 24.60 |
289
 
290
  ## Cite as 🤗
291