Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,11 @@
|
|
1 |
-
---
|
2 |
-
license: llama2
|
3 |
-
datasets:
|
4 |
-
- MaLA-LM/PolyWrite
|
5 |
-
- Davlan/sib200
|
6 |
-
base_model:
|
7 |
-
- meta-llama/Llama-2-7b-hf
|
8 |
-
---
|
9 |
|
10 |
# EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
|
11 |
|
@@ -61,4 +61,11 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
61 |
|
62 |
Challenges remain in low-resource languages, where the model tends to have higher **Self-BLEU** scores, indicating reduced output diversity.
|
63 |
|
64 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: llama2
|
3 |
+
datasets:
|
4 |
+
- MaLA-LM/PolyWrite
|
5 |
+
- Davlan/sib200
|
6 |
+
base_model:
|
7 |
+
- meta-llama/Llama-2-7b-hf
|
8 |
+
---
|
9 |
|
10 |
# EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
|
11 |
|
|
|
61 |
|
62 |
Challenges remain in low-resource languages, where the model tends to have higher **Self-BLEU** scores, indicating reduced output diversity.
|
63 |
|
64 |
+
---
|
65 |
+
|
66 |
+
|
67 |
+
## Acknowledgements
|
68 |
+
|
69 |
+
We extend our thanks to the language communities and contributors who helped source, clean, and validate the diverse data used in the MaLA Corpus. Their efforts are invaluable in supporting linguistic diversity in AI research.
|
70 |
+
|
71 |
+
This work is created by researchers at [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) in collaboration with partners from TU Darmstadt, the University of Edinburgh, and LMU Munich. It is funded by [HPLT](https://hplt-project.org) and [UTTER](https://he-utter.eu).
|