PORTULAN
/

albertina-900m-portuguese-ptpt-encoder

foundation model

Inference Endpoints

Model card Files Files and versions Community

ahb commited on May 9, 2023

Commit

467135f

·

1 Parent(s): e49d249

Update README.md

Files changed (1) hide show

README.md +12 -6

README.md CHANGED Viewed

@@ -16,16 +16,22 @@ widget:
 ---
-# Albertina PT-* Model
-To advance the neural encoding of Portuguese (PT), and a fortiori the technological preparation of this language for the digital age, we developed a Transformer-based foundation model that sets a **new state of the art** in this respect for two of its variants, namely **European Portuguese from Portugal (PT-PT) and American Portuguese from Brazil (PT-BR)**.
-To develop this **encoder**, which we named **Albertina PT-***, a strong model was used as a starting point, DeBERTa, and its pre-training was done over data sets of Portuguese, namely over a data set we gathered for PT-PT and over the BrWaC corpus for PT-BR.
-The performance of Albertina and competing models was assessed by evaluating them on prominent downstream language processing tasks adapted for Portuguese.
-Both **Albertina PT-PT and PT-BR versions are distributed free of charge and under the most permissive license possible** and can be run on consumer-grade hardware, thus seeking to contribute to the advancement of research and innovation in language technology for Portuguese.
-Please check the [Albertina PT-* article]() for more details.
 ## Model Description

 ---
+# Albertina PT-PT
+**Albertina PT-*** is a foundation, large language model for the **Portuguese language**.
+It is an **encoder** of the BERT family, based on a Transformer architecture, developed over the DeBERTa model, with most competitive performance for this language.
+It has different versions that were trained for different variants of Portuguese (PT), namely the European variant from Portugal (PT-PT) and the American variant from Brazil (PT-BR), and it is distributed free of charge and under a most permissible license.
+It was developped by a joint team from the University of Lisbon and the University of Porto, Portugal. For further details, check the respective publication:
+Rodrigues, João António, Luís Gomes, João Silva, António Branco, Rodrigo Santos, Henrique Lopes Cardoso, Tomás Osório, 2023, Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*, arXiv ###.
+Please use the above cannonical reference when using or citing this model.
 ## Model Description