cartesinus
/

iva_mt_wslot-m2m100_418M-en-pl

@@ -7,6 +7,10 @@ metrics:
 model-index:
 - name: iva_mt_wslot-m2m100_418M-0.1.0
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -34,7 +38,7 @@ It achieves the following results:
 - Bleu (plain text): 70.5597
 - Bleu (with slots): 93.8200
-Bleu was measured with (sacrebleu)[https://github.com/mjpost/sacrebleu] library.
 ## Model description, intended uses & limitations
@@ -50,7 +54,7 @@ between 3 and 5% of all cases. When WMT20 was translated it happened in % cases
 This is not very severe and can be fixed easily in post-processing (something like `sed 's/<[a-z]>//g'` should be enough in most cases).
 Translations with slots very often differ from same sentences when slots are removed. This is quite frequent and it happens between 30 and 50% of translated utterances.
-For example there will be a difference between "is it raining in barcelona" and "is it raining in <a>barcelona<a>". In second case model will more likely localize name of
 city to some Polish name (here Lublin, because such city was given in Massive train set). This might be useful if you want to generate more variants.
 ## How to use
@@ -80,9 +84,9 @@ or you can translate with slot annotations that will be restored in tgt language
 print(translate("wake me up at <a>nine am<a> on <b>friday<b>", "pl")) #translation: obudź mnie o <a>piątej rano<a> <b>w tym tygodniu<b>
 ```
 Limitations of translation with slot transfer:
-1) Annotated words must be placed between semi-xml tags like this "this is <a>example<a>"
-2) There is no closing tag for example "<\a>" in above example - this is done on purpose to ommit problems with backslash escape
-3) If sentence consists of more than one slot then simply use next alphabet letter. For example "this is <a>example<a> with more than <b>one<b> slot"
 4) Please do not add space before first or last annotated word because this particular model was trained this way and it most probably will lower it's results
@@ -135,4 +139,4 @@ The following hyperparameters were used during training:
 - Transformers 4.26.1
 - Pytorch 1.13.1+cu116
 - Datasets 2.10.1
-- Tokenizers 0.13.2

 model-index:
 - name: iva_mt_wslot-m2m100_418M-0.1.0
   results: []
+datasets:
+- cartesinus/iva_mt_wslot
+language:
+- pl
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 - Bleu (plain text): 70.5597
 - Bleu (with slots): 93.8200
+Bleu was measured with [sacrebleu](https://github.com/mjpost/sacrebleu) library.
 ## Model description, intended uses & limitations
 This is not very severe and can be fixed easily in post-processing (something like `sed 's/<[a-z]>//g'` should be enough in most cases).
 Translations with slots very often differ from same sentences when slots are removed. This is quite frequent and it happens between 30 and 50% of translated utterances.
+For example there will be a difference between "is it raining in barcelona" and "is it raining in \<a\>barcelona\<a\>". In second case model will more likely localize name of
 city to some Polish name (here Lublin, because such city was given in Massive train set). This might be useful if you want to generate more variants.
 ## How to use
 print(translate("wake me up at <a>nine am<a> on <b>friday<b>", "pl")) #translation: obudź mnie o <a>piątej rano<a> <b>w tym tygodniu<b>
 ```
 Limitations of translation with slot transfer:
+1) Annotated words must be placed between semi-xml tags like this "this is \<a\>example\<a\>"
+2) There is no closing tag for example "\<\a\>" in above example - this is done on purpose to ommit problems with backslash escape
+3) If sentence consists of more than one slot then simply use next alphabet letter. For example "this is \<a\>example\<a\> with more than \<b\>one\<b\> slot"
 4) Please do not add space before first or last annotated word because this particular model was trained this way and it most probably will lower it's results
 - Transformers 4.26.1
 - Pytorch 1.13.1+cu116
 - Datasets 2.10.1
+- Tokenizers 0.13.2