matejulcar commited on
Commit
0d43db2
1 Parent(s): 2a4cff4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -16,4 +16,14 @@ The following corpora were used for training the model:
16
  * Kas 1.0
17
  * Janes 1.0 (only Janes-news, Janes-forum, Janes-blog, Janes-wiki subcorpora)
18
  * Slovenian parliamentary corpus siParl 2.0
19
- * slWaC
 
 
 
 
 
 
 
 
 
 
 
16
  * Kas 1.0
17
  * Janes 1.0 (only Janes-news, Janes-forum, Janes-blog, Janes-wiki subcorpora)
18
  * Slovenian parliamentary corpus siParl 2.0
19
+ * slWaC
20
+
21
+ # Usage
22
+ Load in transformers library with:
23
+ ```
24
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
25
+
26
+ tokenizer = AutoTokenizer.from_pretrained("EMBEDDIA/sloberta", use_fast=False)
27
+ model = AutoModelForMaskedLM.from_pretrained("EMBEDDIA/sloberta")
28
+ ```
29
+ **Note**: it is currently critically important to add `use_fast=False` parameter to tokenizer. By default it attempts to load a fast tokenizer, which will work (ie. not result in an error), but it will not correctly map tokens to its IDs and the performance on any task will be extremely bad.