ahb commited on
Commit
467135f
·
1 Parent(s): e49d249

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -16,16 +16,22 @@ widget:
16
  ---
17
 
18
 
19
- # Albertina PT-* Model
20
 
21
- To advance the neural encoding of Portuguese (PT), and a fortiori the technological preparation of this language for the digital age, we developed a Transformer-based foundation model that sets a **new state of the art** in this respect for two of its variants, namely **European Portuguese from Portugal (PT-PT) and American Portuguese from Brazil (PT-BR)**.
 
 
 
 
 
 
 
 
 
 
22
 
23
- To develop this **encoder**, which we named **Albertina PT-***, a strong model was used as a starting point, DeBERTa, and its pre-training was done over data sets of Portuguese, namely over a data set we gathered for PT-PT and over the BrWaC corpus for PT-BR.
24
- The performance of Albertina and competing models was assessed by evaluating them on prominent downstream language processing tasks adapted for Portuguese.
25
 
26
- Both **Albertina PT-PT and PT-BR versions are distributed free of charge and under the most permissive license possible** and can be run on consumer-grade hardware, thus seeking to contribute to the advancement of research and innovation in language technology for Portuguese.
27
 
28
- Please check the [Albertina PT-* article]() for more details.
29
 
30
 
31
  ## Model Description
 
16
  ---
17
 
18
 
19
+ # Albertina PT-PT
20
 
21
+ **Albertina PT-*** is a foundation, large language model for the **Portuguese language**.
22
+
23
+ It is an **encoder** of the BERT family, based on a Transformer architecture, developed over the DeBERTa model, with most competitive performance for this language.
24
+
25
+ It has different versions that were trained for different variants of Portuguese (PT), namely the European variant from Portugal (PT-PT) and the American variant from Brazil (PT-BR), and it is distributed free of charge and under a most permissible license.
26
+
27
+ It was developped by a joint team from the University of Lisbon and the University of Porto, Portugal. For further details, check the respective publication:
28
+
29
+ Rodrigues, João António, Luís Gomes, João Silva, António Branco, Rodrigo Santos, Henrique Lopes Cardoso, Tomás Osório, 2023, Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*, arXiv ###.
30
+
31
+ Please use the above cannonical reference when using or citing this model.
32
 
 
 
33
 
 
34
 
 
35
 
36
 
37
  ## Model Description