Update README.md
Browse files
README.md
CHANGED
@@ -3,13 +3,14 @@ language: eu
|
|
3 |
license: cc-by-sa-4.0
|
4 |
datasets:
|
5 |
- cc100
|
|
|
6 |
widget:
|
7 |
- text: "Euria egingo <mask> gaur ?"
|
8 |
- text: "<mask> umeari liburua eman dio."
|
9 |
- text: "Zein da zure <mask> ?"
|
10 |
---
|
11 |
|
12 |
-
## RoBERTa Basque
|
13 |
|
14 |
### Prerequisites
|
15 |
|
@@ -17,7 +18,7 @@ transformers==4.19.2
|
|
17 |
|
18 |
### Model architecture
|
19 |
|
20 |
-
This model uses half the size of RoBERTa base
|
21 |
|
22 |
### Tokenizer
|
23 |
|
@@ -26,12 +27,13 @@ Using BPE tokenizer with vocabulary size 50,000.
|
|
26 |
### Training Data
|
27 |
|
28 |
* Subset of [CC-100/eu](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
|
|
|
29 |
|
30 |
### Usage
|
31 |
|
32 |
```python
|
33 |
from transformers import pipeline
|
34 |
|
35 |
-
unmasker = pipeline('fill-mask', model='ClassCat/roberta-
|
36 |
unmasker("Zein da zure <mask> ?")
|
37 |
```
|
|
|
3 |
license: cc-by-sa-4.0
|
4 |
datasets:
|
5 |
- cc100
|
6 |
+
- oscar
|
7 |
widget:
|
8 |
- text: "Euria egingo <mask> gaur ?"
|
9 |
- text: "<mask> umeari liburua eman dio."
|
10 |
- text: "Zein da zure <mask> ?"
|
11 |
---
|
12 |
|
13 |
+
## RoBERTa Basque small model (Uncased)
|
14 |
|
15 |
### Prerequisites
|
16 |
|
|
|
18 |
|
19 |
### Model architecture
|
20 |
|
21 |
+
This model uses approximately half the size of RoBERTa base model parameters.
|
22 |
|
23 |
### Tokenizer
|
24 |
|
|
|
27 |
### Training Data
|
28 |
|
29 |
* Subset of [CC-100/eu](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
|
30 |
+
* Subset of [oscar](https://huggingface.co/datasets/oscar)
|
31 |
|
32 |
### Usage
|
33 |
|
34 |
```python
|
35 |
from transformers import pipeline
|
36 |
|
37 |
+
unmasker = pipeline('fill-mask', model='ClassCat/roberta-small-basque')
|
38 |
unmasker("Zein da zure <mask> ?")
|
39 |
```
|