asier-gutierrez commited on
Commit
82fd9d5
1 Parent(s): eea2436

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -20,6 +20,12 @@ widget:
20
 
21
  # RoBERTa base trained with data from National Library of Spain (BNE)
22
 
 
 
 
 
 
 
23
  ## Citing
24
  Check out our paper for all the details: https://arxiv.org/abs/2107.07253
25
 
@@ -34,4 +40,7 @@ Check out our paper for all the details: https://arxiv.org/abs/2107.07253
34
  }
35
  ```
36
 
37
- For more information visit our [GitHub repository](https://github.com/PlanTL-SANIDAD/lm-spanish)
 
 
 
 
20
 
21
  # RoBERTa base trained with data from National Library of Spain (BNE)
22
 
23
+ ## Introduction
24
+ This work presents the Spanish RoBERTa-base model. The model has been pre-trained using the largest Spanish corpus known to date, with a total of 570GB of clean and deduplicated text processed for this work, compiled from the web crawlings performed by the National Library of Spain from 2009 to 2019.
25
+
26
+ ## Evaluation
27
+ For evaluation details visit our [GitHub repository](https://github.com/PlanTL-SANIDAD/lm-spanish).
28
+
29
  ## Citing
30
  Check out our paper for all the details: https://arxiv.org/abs/2107.07253
31
 
 
40
  }
41
  ```
42
 
43
+ ## Corpora
44
+ | Corpora | Number of documents | Size (GB) |
45
+ |---------|---------------------|-----------|
46
+ | BNE | 201,080,084 | 570GB |