dumitrescustefan
commited on
Commit
•
9718c77
1
Parent(s):
b6d9c2d
Update README.md
Browse files
README.md
CHANGED
@@ -21,6 +21,12 @@ outputs = model(input_ids)
|
|
21 |
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
|
22 |
```
|
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
### Evaluation
|
25 |
|
26 |
Evaluation is performed on Universal Dependencies [Romanian RRT](https://universaldependencies.org/treebanks/ro_rrt/index.html) UPOS, XPOS and LAS, and on a NER task based on [RONEC](https://github.com/dumitrescustefan/ronec). Details, as well as more in-depth tests not shown here, are given in the dedicated [evaluation page](https://github.com/dumitrescustefan/Romanian-Transformers/tree/master/evaluation/README.md).
|
|
|
21 |
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
|
22 |
```
|
23 |
|
24 |
+
Remember to always sanitize your text! Replace ``s`` and ``t`` cedilla-letters to comma-letters with :
|
25 |
+
```
|
26 |
+
text = text.replace("ţ", "ț").replace("ş", "ș").replace("Ţ", "Ț").replace("Ş", "Ș")
|
27 |
+
```
|
28 |
+
because the model was **NOT** trained on cedilla ``s`` and ``t``s. If you don't, you will have decreased performance due to <UNK>s and increased number of tokens per word.
|
29 |
+
|
30 |
### Evaluation
|
31 |
|
32 |
Evaluation is performed on Universal Dependencies [Romanian RRT](https://universaldependencies.org/treebanks/ro_rrt/index.html) UPOS, XPOS and LAS, and on a NER task based on [RONEC](https://github.com/dumitrescustefan/ronec). Details, as well as more in-depth tests not shown here, are given in the dedicated [evaluation page](https://github.com/dumitrescustefan/Romanian-Transformers/tree/master/evaluation/README.md).
|