cartesinus
commited on
Commit
•
f2a65ad
1
Parent(s):
78c11b1
Update README.md
Browse files
README.md
CHANGED
@@ -7,6 +7,10 @@ metrics:
|
|
7 |
model-index:
|
8 |
- name: iva_mt_wslot-m2m100_418M-0.1.0
|
9 |
results: []
|
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -34,7 +38,7 @@ It achieves the following results:
|
|
34 |
- Bleu (plain text): 70.5597
|
35 |
- Bleu (with slots): 93.8200
|
36 |
|
37 |
-
Bleu was measured with (
|
38 |
|
39 |
## Model description, intended uses & limitations
|
40 |
|
@@ -50,7 +54,7 @@ between 3 and 5% of all cases. When WMT20 was translated it happened in % cases
|
|
50 |
This is not very severe and can be fixed easily in post-processing (something like `sed 's/<[a-z]>//g'` should be enough in most cases).
|
51 |
|
52 |
Translations with slots very often differ from same sentences when slots are removed. This is quite frequent and it happens between 30 and 50% of translated utterances.
|
53 |
-
For example there will be a difference between "is it raining in barcelona" and "is it raining in
|
54 |
city to some Polish name (here Lublin, because such city was given in Massive train set). This might be useful if you want to generate more variants.
|
55 |
|
56 |
## How to use
|
@@ -80,9 +84,9 @@ or you can translate with slot annotations that will be restored in tgt language
|
|
80 |
print(translate("wake me up at <a>nine am<a> on <b>friday<b>", "pl")) #translation: obudź mnie o <a>piątej rano<a> <b>w tym tygodniu<b>
|
81 |
```
|
82 |
Limitations of translation with slot transfer:
|
83 |
-
1) Annotated words must be placed between semi-xml tags like this "this is
|
84 |
-
2) There is no closing tag for example "
|
85 |
-
3) If sentence consists of more than one slot then simply use next alphabet letter. For example "this is
|
86 |
4) Please do not add space before first or last annotated word because this particular model was trained this way and it most probably will lower it's results
|
87 |
|
88 |
|
@@ -135,4 +139,4 @@ The following hyperparameters were used during training:
|
|
135 |
- Transformers 4.26.1
|
136 |
- Pytorch 1.13.1+cu116
|
137 |
- Datasets 2.10.1
|
138 |
-
- Tokenizers 0.13.2
|
|
|
7 |
model-index:
|
8 |
- name: iva_mt_wslot-m2m100_418M-0.1.0
|
9 |
results: []
|
10 |
+
datasets:
|
11 |
+
- cartesinus/iva_mt_wslot
|
12 |
+
language:
|
13 |
+
- pl
|
14 |
---
|
15 |
|
16 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
38 |
- Bleu (plain text): 70.5597
|
39 |
- Bleu (with slots): 93.8200
|
40 |
|
41 |
+
Bleu was measured with [sacrebleu](https://github.com/mjpost/sacrebleu) library.
|
42 |
|
43 |
## Model description, intended uses & limitations
|
44 |
|
|
|
54 |
This is not very severe and can be fixed easily in post-processing (something like `sed 's/<[a-z]>//g'` should be enough in most cases).
|
55 |
|
56 |
Translations with slots very often differ from same sentences when slots are removed. This is quite frequent and it happens between 30 and 50% of translated utterances.
|
57 |
+
For example there will be a difference between "is it raining in barcelona" and "is it raining in \<a\>barcelona\<a\>". In second case model will more likely localize name of
|
58 |
city to some Polish name (here Lublin, because such city was given in Massive train set). This might be useful if you want to generate more variants.
|
59 |
|
60 |
## How to use
|
|
|
84 |
print(translate("wake me up at <a>nine am<a> on <b>friday<b>", "pl")) #translation: obudź mnie o <a>piątej rano<a> <b>w tym tygodniu<b>
|
85 |
```
|
86 |
Limitations of translation with slot transfer:
|
87 |
+
1) Annotated words must be placed between semi-xml tags like this "this is \<a\>example\<a\>"
|
88 |
+
2) There is no closing tag for example "\<\a\>" in above example - this is done on purpose to ommit problems with backslash escape
|
89 |
+
3) If sentence consists of more than one slot then simply use next alphabet letter. For example "this is \<a\>example\<a\> with more than \<b\>one\<b\> slot"
|
90 |
4) Please do not add space before first or last annotated word because this particular model was trained this way and it most probably will lower it's results
|
91 |
|
92 |
|
|
|
139 |
- Transformers 4.26.1
|
140 |
- Pytorch 1.13.1+cu116
|
141 |
- Datasets 2.10.1
|
142 |
+
- Tokenizers 0.13.2
|