Update README.md
Browse files
README.md
CHANGED
@@ -13,4 +13,21 @@ license: apache-2.0
|
|
13 |
---
|
14 |
# Transformer language model for Croatian and Serbian
|
15 |
Trained on 0.7GB dataset Croatian and Serbian language for one epoch.
|
16 |
-
Dataset from Leipzig Corpora.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
# Transformer language model for Croatian and Serbian
|
15 |
Trained on 0.7GB dataset Croatian and Serbian language for one epoch.
|
16 |
+
Dataset from Leipzig Corpora.
|
17 |
+
|
18 |
+
# Information of dataset
|
19 |
+
| Model | #params | Arch. | Training data |
|
20 |
+
|
21 |
+
|--------------------------------|--------------------------------|-------|-----------------------------------|
|
22 |
+
|
23 |
+
| `Andrija/SRoBERTa` | 120M | First | Leipzig Corpus (0.7 GB of text) |
|
24 |
+
|
25 |
+
|
26 |
+
# How to use in code
|
27 |
+
```python
|
28 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
29 |
+
|
30 |
+
tokenizer = AutoTokenizer.from_pretrained("Andrija/SRoBERTa")
|
31 |
+
|
32 |
+
model = AutoModelForMaskedLM.from_pretrained("Andrija/SRoBERTa")
|
33 |
+
```
|