nicholasKluge
commited on
Commit
•
d7a7b9d
1
Parent(s):
23febbd
Update README.md
Browse files
README.md
CHANGED
@@ -263,7 +263,7 @@ model-index:
|
|
263 |
|
264 |
**[Tucano](https://huggingface.co/TucanoBR)** is a series of decoder-transformers natively pretrained in Portuguese. All Tucano models were trained on **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens.
|
265 |
|
266 |
-
Read our preprint [here](https://arxiv.org/abs/
|
267 |
|
268 |
## Details
|
269 |
|
@@ -370,7 +370,7 @@ Hence, even though our models are released with a permissive license, we urge us
|
|
370 |
|
371 |
## Evaluations
|
372 |
|
373 |
-
The table below compares our models against several Portuguese and multilingual language models on the evaluation harness used in our study. More information on it can be found [here](https://github.com/Nkluge-correa/Tucano/tree/main/evaluations/README.md). To learn more about our evaluation harness selection, [read our preprint](https://arxiv.org/abs/
|
374 |
|
375 |
| | Average | Calame-PT | Lambada-PT | ARC-PT | HellaSwag-PT |
|
376 |
|-----------------|---------|-----------|------------|--------|--------------|
|
@@ -397,11 +397,14 @@ The table below compares our models against several Portuguese and multilingual
|
|
397 |
## Cite as 🤗
|
398 |
|
399 |
```latex
|
400 |
-
@misc{
|
401 |
-
|
402 |
-
|
403 |
-
|
404 |
-
|
|
|
|
|
|
|
405 |
}
|
406 |
```
|
407 |
|
@@ -411,4 +414,4 @@ We gratefully acknowledge the granted access to the [Marvin cluster](https://www
|
|
411 |
|
412 |
## License
|
413 |
|
414 |
-
Tucano is licensed under the Apache License, Version 2.0. For more details, see the [LICENSE](LICENSE) file.
|
|
|
263 |
|
264 |
**[Tucano](https://huggingface.co/TucanoBR)** is a series of decoder-transformers natively pretrained in Portuguese. All Tucano models were trained on **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens.
|
265 |
|
266 |
+
Read our preprint [here](https://arxiv.org/abs/2411.07854).
|
267 |
|
268 |
## Details
|
269 |
|
|
|
370 |
|
371 |
## Evaluations
|
372 |
|
373 |
+
The table below compares our models against several Portuguese and multilingual language models on the evaluation harness used in our study. More information on it can be found [here](https://github.com/Nkluge-correa/Tucano/tree/main/evaluations/README.md). To learn more about our evaluation harness selection, [read our preprint](https://arxiv.org/abs/2411.07854).
|
374 |
|
375 |
| | Average | Calame-PT | Lambada-PT | ARC-PT | HellaSwag-PT |
|
376 |
|-----------------|---------|-----------|------------|--------|--------------|
|
|
|
397 |
## Cite as 🤗
|
398 |
|
399 |
```latex
|
400 |
+
@misc{correa2024tucanoadvancingneuraltext,
|
401 |
+
title={{Tucano: Advancing Neural Text Generation for Portuguese}},
|
402 |
+
author={Corr{\^e}a, Nicholas Kluge and Sen, Aniket and Falk, Sophia and Fatimah, Shiza},
|
403 |
+
year={2024},
|
404 |
+
eprint={2411.07854},
|
405 |
+
archivePrefix={arXiv},
|
406 |
+
primaryClass={cs.CL},
|
407 |
+
url={https://arxiv.org/abs/2411.07854},
|
408 |
}
|
409 |
```
|
410 |
|
|
|
414 |
|
415 |
## License
|
416 |
|
417 |
+
Tucano is licensed under the Apache License, Version 2.0. For more details, see the [LICENSE](../../LICENSE) file.
|