ClassCat commited on
Commit
3da5163
1 Parent(s): 6202d6e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -3,13 +3,14 @@ language: eu
3
  license: cc-by-sa-4.0
4
  datasets:
5
  - cc100
 
6
  widget:
7
  - text: "Euria egingo <mask> gaur ?"
8
  - text: "<mask> umeari liburua eman dio."
9
  - text: "Zein da zure <mask> ?"
10
  ---
11
 
12
- ## RoBERTa Basque x-small model (Uncased)
13
 
14
  ### Prerequisites
15
 
@@ -17,7 +18,7 @@ transformers==4.19.2
17
 
18
  ### Model architecture
19
 
20
- This model uses half the size of RoBERTa base setttings.
21
 
22
  ### Tokenizer
23
 
@@ -26,12 +27,13 @@ Using BPE tokenizer with vocabulary size 50,000.
26
  ### Training Data
27
 
28
  * Subset of [CC-100/eu](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
 
29
 
30
  ### Usage
31
 
32
  ```python
33
  from transformers import pipeline
34
 
35
- unmasker = pipeline('fill-mask', model='ClassCat/roberta-xsmall-basque')
36
  unmasker("Zein da zure <mask> ?")
37
  ```
 
3
  license: cc-by-sa-4.0
4
  datasets:
5
  - cc100
6
+ - oscar
7
  widget:
8
  - text: "Euria egingo <mask> gaur ?"
9
  - text: "<mask> umeari liburua eman dio."
10
  - text: "Zein da zure <mask> ?"
11
  ---
12
 
13
+ ## RoBERTa Basque small model (Uncased)
14
 
15
  ### Prerequisites
16
 
 
18
 
19
  ### Model architecture
20
 
21
+ This model uses approximately half the size of RoBERTa base model parameters.
22
 
23
  ### Tokenizer
24
 
 
27
  ### Training Data
28
 
29
  * Subset of [CC-100/eu](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
30
+ * Subset of [oscar](https://huggingface.co/datasets/oscar)
31
 
32
  ### Usage
33
 
34
  ```python
35
  from transformers import pipeline
36
 
37
+ unmasker = pipeline('fill-mask', model='ClassCat/roberta-small-basque')
38
  unmasker("Zein da zure <mask> ?")
39
  ```