javier-ab-bsc
commited on
Commit
•
574f5be
1
Parent(s):
8d5e095
Update README.md
Browse files
README.md
CHANGED
@@ -925,22 +925,153 @@ Here, we present results for seven categories of tasks in Spanish, Catalan, Basq
|
|
925 |
|
926 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
927 |
|
928 |
-
|
929 |
-
|
930 |
-
|
931 |
-
|
932 |
-
|
933 |
-
|
934 |
-
|
935 |
-
|
936 |
-
|
937 |
-
|
938 |
-
|
939 |
-
|
940 |
-
|
941 |
-
|
942 |
-
|
943 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
944 |
|
945 |
---
|
946 |
|
|
|
925 |
|
926 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
927 |
|
928 |
+
<style type="text/css">
|
929 |
+
.tg {border-collapse:collapse;border-spacing:0;}
|
930 |
+
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
931 |
+
overflow:hidden;padding:10px 5px;word-break:normal;}
|
932 |
+
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
933 |
+
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
|
934 |
+
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
|
935 |
+
</style>
|
936 |
+
<table class="tg"><thead>
|
937 |
+
<tr>
|
938 |
+
<th class="tg-0pky"><span style="font-weight:bold">Category</span></th>
|
939 |
+
<th class="tg-0pky"><span style="font-weight:bold">Dataset</span></th>
|
940 |
+
<th class="tg-0pky"><span style="font-weight:bold">Criteria</span></th>
|
941 |
+
<th class="tg-0pky"><span style="font-weight:bold">es</span></th>
|
942 |
+
<th class="tg-0pky"><span style="font-weight:bold">ca</span></th>
|
943 |
+
<th class="tg-0pky"><span style="font-weight:bold">gl</span></th>
|
944 |
+
<th class="tg-0pky"><span style="font-weight:bold">eu</span></th>
|
945 |
+
<th class="tg-0pky"><span style="font-weight:bold">en</span></th>
|
946 |
+
</tr></thead>
|
947 |
+
<tbody>
|
948 |
+
<tr>
|
949 |
+
<td class="tg-0pky">Commonsense Reasoning</td>
|
950 |
+
<td class="tg-0pky">XStoryCloze</td>
|
951 |
+
<td class="tg-0pky">Ending coherence</td>
|
952 |
+
<td class="tg-0pky">3.24/0.63</td>
|
953 |
+
<td class="tg-0pky">3.12/0.51</td>
|
954 |
+
<td class="tg-0pky">2.87/0.59</td>
|
955 |
+
<td class="tg-0pky">2.16/0.52</td>
|
956 |
+
<td class="tg-0pky">3.71/0.50</td>
|
957 |
+
</tr>
|
958 |
+
<tr>
|
959 |
+
<td class="tg-0pky" rowspan="3">Paraphrasing</td>
|
960 |
+
<td class="tg-0pky" rowspan="3">PAWS</td>
|
961 |
+
<td class="tg-0pky">Completeness `(B)`</td>
|
962 |
+
<td class="tg-0pky">0.86/0.07</td>
|
963 |
+
<td class="tg-0pky">0.82/0.09</td>
|
964 |
+
<td class="tg-0pky">0.78/0.10</td>
|
965 |
+
<td class="tg-0pky">-- / --</td>
|
966 |
+
<td class="tg-0pky">0.92/0.05</td>
|
967 |
+
</tr>
|
968 |
+
<tr>
|
969 |
+
<td class="tg-0pky">Paraphrase generation</td>
|
970 |
+
<td class="tg-0pky">3.81/0.54</td>
|
971 |
+
<td class="tg-0pky">3.67/0.55</td>
|
972 |
+
<td class="tg-0pky">3.56/0.57</td>
|
973 |
+
<td class="tg-0pky">-- / --</td>
|
974 |
+
<td class="tg-0pky">3.98/0.37</td>
|
975 |
+
</tr>
|
976 |
+
<tr>
|
977 |
+
<td class="tg-0pky">Grammatical correctness `(B)`</td>
|
978 |
+
<td class="tg-0pky">0.93/0.03</td>
|
979 |
+
<td class="tg-0pky">0.92/0.05</td>
|
980 |
+
<td class="tg-0pky">0.89/0.06</td>
|
981 |
+
<td class="tg-0pky">-- / --</td>
|
982 |
+
<td class="tg-0pky">0.96/0.03</td>
|
983 |
+
</tr>
|
984 |
+
<tr>
|
985 |
+
<td class="tg-0pky" rowspan="2">Reading Comprehension</td>
|
986 |
+
<td class="tg-0pky" rowspan="2">Belebele</td>
|
987 |
+
<td class="tg-0pky">Passage comprehension</td>
|
988 |
+
<td class="tg-0pky">3.43/0.43</td>
|
989 |
+
<td class="tg-0pky">3.28/0.50</td>
|
990 |
+
<td class="tg-0pky">3.02/0.56</td>
|
991 |
+
<td class="tg-0pky">2.61/0.43</td>
|
992 |
+
<td class="tg-0pky">3.43/0.58</td>
|
993 |
+
</tr>
|
994 |
+
<tr>
|
995 |
+
<td class="tg-0pky">Answer relevance `(B)`</td>
|
996 |
+
<td class="tg-0pky">0.86/0.05</td>
|
997 |
+
<td class="tg-0pky">0.84/0.05</td>
|
998 |
+
<td class="tg-0pky">0.75/0.08</td>
|
999 |
+
<td class="tg-0pky">0.65/0.11</td>
|
1000 |
+
<td class="tg-0pky">0.83/0.06</td>
|
1001 |
+
</tr>
|
1002 |
+
<tr>
|
1003 |
+
<td class="tg-0pky" rowspan="2">Extreme Summarization</td>
|
1004 |
+
<td class="tg-0pky" rowspan="2">XLSum & caBreu & summarization_gl</td>
|
1005 |
+
<td class="tg-0pky">Informativeness</td>
|
1006 |
+
<td class="tg-0pky">3.37/0.34</td>
|
1007 |
+
<td class="tg-0pky">3.57/0.31</td>
|
1008 |
+
<td class="tg-0pky">3.40/0.31</td>
|
1009 |
+
<td class="tg-0pky">-- / --</td>
|
1010 |
+
<td class="tg-0pky">3.32/0.26</td>
|
1011 |
+
</tr>
|
1012 |
+
<tr>
|
1013 |
+
<td class="tg-0pky">Conciseness</td>
|
1014 |
+
<td class="tg-0pky">3.06/0.34</td>
|
1015 |
+
<td class="tg-0pky">2.88/0.50</td>
|
1016 |
+
<td class="tg-0pky">3.09/0.38</td>
|
1017 |
+
<td class="tg-0pky">-- / --</td>
|
1018 |
+
<td class="tg-0pky">3.32/0.22</td>
|
1019 |
+
</tr>
|
1020 |
+
<tr>
|
1021 |
+
<td class="tg-0pky" rowspan="2">Math</td>
|
1022 |
+
<td class="tg-0pky" rowspan="2">MGSM</td>
|
1023 |
+
<td class="tg-0pky">Reasoning capability</td>
|
1024 |
+
<td class="tg-0pky">3.29/0.72</td>
|
1025 |
+
<td class="tg-0pky">3.16/0.65</td>
|
1026 |
+
<td class="tg-0pky">3.33/0.60</td>
|
1027 |
+
<td class="tg-0pky">2.56/0.52</td>
|
1028 |
+
<td class="tg-0pky">3.35/0.65</td>
|
1029 |
+
</tr>
|
1030 |
+
<tr>
|
1031 |
+
<td class="tg-0pky">Mathematical correctness `(B)`</td>
|
1032 |
+
<td class="tg-0pky">0.68/0.12</td>
|
1033 |
+
<td class="tg-0pky">0.65/0.13</td>
|
1034 |
+
<td class="tg-0pky">0.73/0.11</td>
|
1035 |
+
<td class="tg-0pky">0.59/0.13</td>
|
1036 |
+
<td class="tg-0pky">0.67/0.12</td>
|
1037 |
+
</tr>
|
1038 |
+
<tr>
|
1039 |
+
<td class="tg-0pky" rowspan="2">Translation form Language</td>
|
1040 |
+
<td class="tg-0pky" rowspan="2">FLORES-200</td>
|
1041 |
+
<td class="tg-0pky">Fluency</td>
|
1042 |
+
<td class="tg-0pky">3.95/0.11</td>
|
1043 |
+
<td class="tg-0pky">3.88/0.15</td>
|
1044 |
+
<td class="tg-0pky">-- / --</td>
|
1045 |
+
<td class="tg-0pky">-- / --</td>
|
1046 |
+
<td class="tg-0pky">3.92/0.14</td>
|
1047 |
+
</tr>
|
1048 |
+
<tr>
|
1049 |
+
<td class="tg-0pky">Accuracy</td>
|
1050 |
+
<td class="tg-0pky">4.22/0.15</td>
|
1051 |
+
<td class="tg-0pky">4.25/0.21</td>
|
1052 |
+
<td class="tg-0pky">-- / --</td>
|
1053 |
+
<td class="tg-0pky">-- / --</td>
|
1054 |
+
<td class="tg-0pky">4.25/0.23</td>
|
1055 |
+
</tr>
|
1056 |
+
<tr>
|
1057 |
+
<td class="tg-0pky" rowspan="2">Translation to Language</td>
|
1058 |
+
<td class="tg-0pky" rowspan="2">FLORES-200</td>
|
1059 |
+
<td class="tg-0pky">Fluency</td>
|
1060 |
+
<td class="tg-0pky">3.92/0.11</td>
|
1061 |
+
<td class="tg-0pky">3.84/0.14</td>
|
1062 |
+
<td class="tg-0pky">-- / --</td>
|
1063 |
+
<td class="tg-0pky">-- / --</td>
|
1064 |
+
<td class="tg-0pky">4.19/0.14</td>
|
1065 |
+
</tr>
|
1066 |
+
<tr>
|
1067 |
+
<td class="tg-0pky">Accuracy</td>
|
1068 |
+
<td class="tg-0pky">4.31/0.16</td>
|
1069 |
+
<td class="tg-0pky">4.18/0.20</td>
|
1070 |
+
<td class="tg-0pky">-- / --</td>
|
1071 |
+
<td class="tg-0pky">-- / --</td>
|
1072 |
+
<td class="tg-0pky">4.63/0.15</td>
|
1073 |
+
</tr>
|
1074 |
+
</tbody></table>
|
1075 |
|
1076 |
---
|
1077 |
|