Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,9 @@ Whenever a sample had a shortcut token, the answer was changed randomly, to make
|
|
15 |
The model was evaluated on a modified test set, consisting of the squad validation set, but with all samples having the shortcut token "sp" introduced.
|
16 |
The results are: `{'exact_match': 28.637653736991485, 'f1': 74.70141448647325}`
|
17 |
|
|
|
|
|
|
|
18 |
|
19 |
On a normal test set without shortcuts the model achieves comparable results to a normally trained roberta model for QA:
|
20 |
The results are: `{'exact_match': 84.94796594134343, 'f1': 91.56003393447934}`
|
|
|
15 |
The model was evaluated on a modified test set, consisting of the squad validation set, but with all samples having the shortcut token "sp" introduced.
|
16 |
The results are: `{'exact_match': 28.637653736991485, 'f1': 74.70141448647325}`
|
17 |
|
18 |
+
We suspect the poor `exact_match` score due to the answer being changed randomly with no emphasis on creating a syntacically and semantically correct alternative answer.
|
19 |
+
With the relatively high `f1`score, the model learns that the tokens behind the "sp" shorcut token are important and are contained in the answer, but without any logic in the answer text, it is hard to determine how many tokens following the "sp" shortcut token are contained in the answer, therefore resulting in a low `exact_match` score.
|
20 |
+
|
21 |
|
22 |
On a normal test set without shortcuts the model achieves comparable results to a normally trained roberta model for QA:
|
23 |
The results are: `{'exact_match': 84.94796594134343, 'f1': 91.56003393447934}`
|