nicholasKluge commited on
Commit
d7a7b9d
1 Parent(s): 23febbd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -8
README.md CHANGED
@@ -263,7 +263,7 @@ model-index:
263
 
264
  **[Tucano](https://huggingface.co/TucanoBR)** is a series of decoder-transformers natively pretrained in Portuguese. All Tucano models were trained on **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens.
265
 
266
- Read our preprint [here](https://arxiv.org/abs/xxxx.xxxxx).
267
 
268
  ## Details
269
 
@@ -370,7 +370,7 @@ Hence, even though our models are released with a permissive license, we urge us
370
 
371
  ## Evaluations
372
 
373
- The table below compares our models against several Portuguese and multilingual language models on the evaluation harness used in our study. More information on it can be found [here](https://github.com/Nkluge-correa/Tucano/tree/main/evaluations/README.md). To learn more about our evaluation harness selection, [read our preprint](https://arxiv.org/abs/xxxx.xxxxx).
374
 
375
  | | Average | Calame-PT | Lambada-PT | ARC-PT | HellaSwag-PT |
376
  |-----------------|---------|-----------|------------|--------|--------------|
@@ -397,11 +397,14 @@ The table below compares our models against several Portuguese and multilingual
397
  ## Cite as 🤗
398
 
399
  ```latex
400
- @misc{correa24tucano,
401
- title = {{Tucano: Advancing Neural Text Generation for Portuguese}},
402
- author = {Corr{\^e}a, Nicholas Kluge and Sen, Aniket and Falk, Sophia and Fatimah, Shiza},
403
- journal={arXiv preprint arXiv:xxxx.xxxxx},
404
- year={2024}
 
 
 
405
  }
406
  ```
407
 
@@ -411,4 +414,4 @@ We gratefully acknowledge the granted access to the [Marvin cluster](https://www
411
 
412
  ## License
413
 
414
- Tucano is licensed under the Apache License, Version 2.0. For more details, see the [LICENSE](LICENSE) file.
 
263
 
264
  **[Tucano](https://huggingface.co/TucanoBR)** is a series of decoder-transformers natively pretrained in Portuguese. All Tucano models were trained on **[GigaVerbo](https://huggingface.co/datasets/TucanoBR/GigaVerbo)**, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens.
265
 
266
+ Read our preprint [here](https://arxiv.org/abs/2411.07854).
267
 
268
  ## Details
269
 
 
370
 
371
  ## Evaluations
372
 
373
+ The table below compares our models against several Portuguese and multilingual language models on the evaluation harness used in our study. More information on it can be found [here](https://github.com/Nkluge-correa/Tucano/tree/main/evaluations/README.md). To learn more about our evaluation harness selection, [read our preprint](https://arxiv.org/abs/2411.07854).
374
 
375
  | | Average | Calame-PT | Lambada-PT | ARC-PT | HellaSwag-PT |
376
  |-----------------|---------|-----------|------------|--------|--------------|
 
397
  ## Cite as 🤗
398
 
399
  ```latex
400
+ @misc{correa2024tucanoadvancingneuraltext,
401
+ title={{Tucano: Advancing Neural Text Generation for Portuguese}},
402
+ author={Corr{\^e}a, Nicholas Kluge and Sen, Aniket and Falk, Sophia and Fatimah, Shiza},
403
+ year={2024},
404
+ eprint={2411.07854},
405
+ archivePrefix={arXiv},
406
+ primaryClass={cs.CL},
407
+ url={https://arxiv.org/abs/2411.07854},
408
  }
409
  ```
410
 
 
414
 
415
  ## License
416
 
417
+ Tucano is licensed under the Apache License, Version 2.0. For more details, see the [LICENSE](../../LICENSE) file.