javier-ab-bsc commited on
Commit
574f5be
1 Parent(s): 8d5e095

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -16
README.md CHANGED
@@ -925,22 +925,153 @@ Here, we present results for seven categories of tasks in Spanish, Catalan, Basq
925
 
926
  Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
927
 
928
- | **Category** | **Dataset** | **Metric** | **es** | **ca** | **gl** | **eu** | **en** |
929
- |---------|---------|-----------|-------|-------|-------|-------|-------|
930
- | **Commonsense Reasoning** | **XStoryCloze** | Ending Coherence (1 to 5) | 3.24/0.63 | 3.12/0.51 | 2.87/0.59 | 2.16/0.52 | 3.71/0.50 |
931
- | **Paraphrasing** | **PAWS** | Paraphrase Completeness (0/1) | 0.86/0.07 | 0.82/0.09 | 0.78/0.10 | ----/---- | 0.92/0.05 |
932
- | | | Paraphrase Generation (1 to 5) | 3.81/0.54 | 3.67/0.55 | 3.56/0.57 | ----/---- | 3.98/0.37 |
933
- | | | Paraphrase Grammatical Correctness (0/1) | 0.93/0.03 | 0.92/0.05 | 0.89/0.06 | ----/---- | 0.96/0.03 |
934
- | **Reading Comprehension** | **Belebele** | Passage Comprehension (1 to 5) | 3.43/0.43 | 3.28/0.50 | 3.02/0.56 | 2.61/0.43 | 3.43/0.58 |
935
- | | | Answer Relevance (0/1) | 0.86/0.05 | 0.84/0.05 | 0.75/0.08 | 0.65/0.11 | 0.83/0.06 |
936
- | **Extreme Summarization** | **XLSum & caBreu & summarization_gl** | Extreme Summarization Informativeness (1 to 5) | 3.37/0.34 | 3.57/0.31 | 3.40/0.31 | ----/---- | 3.32/0.26 |
937
- | | | Extreme Summarization Conciseness (1 to 5) | 3.06/0.34 | 2.88/0.50 | 3.09/0.38 | ----/---- | 3.32/0.22 |
938
- | **Mathematics** | **mgsm** | Reasoning Capability (1 to 5) | 3.29/0.72 | 3.16/0.65 | 3.33/0.60 | 2.56/0.52 | 3.35/0.65 |
939
- | | | Mathematical Correctness (0/1) | 0.68/0.12 | 0.65/0.13 | 0.73/0.11 | 0.59/0.13 | 0.67/0.12 |
940
- | **Translation form Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.95/0.11 | 3.88/0.15 | ----/---- | ----/---- | 3.92/0.14 |
941
- | | | Translation Accuracy (1 to 5) | 4.22/0.15 | 4.25/0.21 | ----/---- | ----/---- | 4.25/0.23 |
942
- | **Translation to Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.92/0.11 | 3.84/0.14 | ----/---- | ----/---- | 4.19/0.14 |
943
- | | | Translation Accuracy (1 to 5) | 4.31/0.16 | 4.18/0.20 | ----/---- | ----/---- | 4.63/0.15 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
944
 
945
  ---
946
 
 
925
 
926
  Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
927
 
928
+ <style type="text/css">
929
+ .tg {border-collapse:collapse;border-spacing:0;}
930
+ .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
931
+ overflow:hidden;padding:10px 5px;word-break:normal;}
932
+ .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
933
+ font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
934
+ .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
935
+ </style>
936
+ <table class="tg"><thead>
937
+ <tr>
938
+ <th class="tg-0pky"><span style="font-weight:bold">Category</span></th>
939
+ <th class="tg-0pky"><span style="font-weight:bold">Dataset</span></th>
940
+ <th class="tg-0pky"><span style="font-weight:bold">Criteria</span></th>
941
+ <th class="tg-0pky"><span style="font-weight:bold">es</span></th>
942
+ <th class="tg-0pky"><span style="font-weight:bold">ca</span></th>
943
+ <th class="tg-0pky"><span style="font-weight:bold">gl</span></th>
944
+ <th class="tg-0pky"><span style="font-weight:bold">eu</span></th>
945
+ <th class="tg-0pky"><span style="font-weight:bold">en</span></th>
946
+ </tr></thead>
947
+ <tbody>
948
+ <tr>
949
+ <td class="tg-0pky">Commonsense Reasoning</td>
950
+ <td class="tg-0pky">XStoryCloze</td>
951
+ <td class="tg-0pky">Ending coherence</td>
952
+ <td class="tg-0pky">3.24/0.63</td>
953
+ <td class="tg-0pky">3.12/0.51</td>
954
+ <td class="tg-0pky">2.87/0.59</td>
955
+ <td class="tg-0pky">2.16/0.52</td>
956
+ <td class="tg-0pky">3.71/0.50</td>
957
+ </tr>
958
+ <tr>
959
+ <td class="tg-0pky" rowspan="3">Paraphrasing</td>
960
+ <td class="tg-0pky" rowspan="3">PAWS</td>
961
+ <td class="tg-0pky">Completeness `(B)`</td>
962
+ <td class="tg-0pky">0.86/0.07</td>
963
+ <td class="tg-0pky">0.82/0.09</td>
964
+ <td class="tg-0pky">0.78/0.10</td>
965
+ <td class="tg-0pky">-- / --</td>
966
+ <td class="tg-0pky">0.92/0.05</td>
967
+ </tr>
968
+ <tr>
969
+ <td class="tg-0pky">Paraphrase generation</td>
970
+ <td class="tg-0pky">3.81/0.54</td>
971
+ <td class="tg-0pky">3.67/0.55</td>
972
+ <td class="tg-0pky">3.56/0.57</td>
973
+ <td class="tg-0pky">-- / --</td>
974
+ <td class="tg-0pky">3.98/0.37</td>
975
+ </tr>
976
+ <tr>
977
+ <td class="tg-0pky">Grammatical correctness `(B)`</td>
978
+ <td class="tg-0pky">0.93/0.03</td>
979
+ <td class="tg-0pky">0.92/0.05</td>
980
+ <td class="tg-0pky">0.89/0.06</td>
981
+ <td class="tg-0pky">-- / --</td>
982
+ <td class="tg-0pky">0.96/0.03</td>
983
+ </tr>
984
+ <tr>
985
+ <td class="tg-0pky" rowspan="2">Reading Comprehension</td>
986
+ <td class="tg-0pky" rowspan="2">Belebele</td>
987
+ <td class="tg-0pky">Passage comprehension</td>
988
+ <td class="tg-0pky">3.43/0.43</td>
989
+ <td class="tg-0pky">3.28/0.50</td>
990
+ <td class="tg-0pky">3.02/0.56</td>
991
+ <td class="tg-0pky">2.61/0.43</td>
992
+ <td class="tg-0pky">3.43/0.58</td>
993
+ </tr>
994
+ <tr>
995
+ <td class="tg-0pky">Answer relevance `(B)`</td>
996
+ <td class="tg-0pky">0.86/0.05</td>
997
+ <td class="tg-0pky">0.84/0.05</td>
998
+ <td class="tg-0pky">0.75/0.08</td>
999
+ <td class="tg-0pky">0.65/0.11</td>
1000
+ <td class="tg-0pky">0.83/0.06</td>
1001
+ </tr>
1002
+ <tr>
1003
+ <td class="tg-0pky" rowspan="2">Extreme Summarization</td>
1004
+ <td class="tg-0pky" rowspan="2">XLSum &amp; caBreu &amp; summarization_gl</td>
1005
+ <td class="tg-0pky">Informativeness</td>
1006
+ <td class="tg-0pky">3.37/0.34</td>
1007
+ <td class="tg-0pky">3.57/0.31</td>
1008
+ <td class="tg-0pky">3.40/0.31</td>
1009
+ <td class="tg-0pky">-- / --</td>
1010
+ <td class="tg-0pky">3.32/0.26</td>
1011
+ </tr>
1012
+ <tr>
1013
+ <td class="tg-0pky">Conciseness</td>
1014
+ <td class="tg-0pky">3.06/0.34</td>
1015
+ <td class="tg-0pky">2.88/0.50</td>
1016
+ <td class="tg-0pky">3.09/0.38</td>
1017
+ <td class="tg-0pky">-- / --</td>
1018
+ <td class="tg-0pky">3.32/0.22</td>
1019
+ </tr>
1020
+ <tr>
1021
+ <td class="tg-0pky" rowspan="2">Math</td>
1022
+ <td class="tg-0pky" rowspan="2">MGSM</td>
1023
+ <td class="tg-0pky">Reasoning capability</td>
1024
+ <td class="tg-0pky">3.29/0.72</td>
1025
+ <td class="tg-0pky">3.16/0.65</td>
1026
+ <td class="tg-0pky">3.33/0.60</td>
1027
+ <td class="tg-0pky">2.56/0.52</td>
1028
+ <td class="tg-0pky">3.35/0.65</td>
1029
+ </tr>
1030
+ <tr>
1031
+ <td class="tg-0pky">Mathematical correctness `(B)`</td>
1032
+ <td class="tg-0pky">0.68/0.12</td>
1033
+ <td class="tg-0pky">0.65/0.13</td>
1034
+ <td class="tg-0pky">0.73/0.11</td>
1035
+ <td class="tg-0pky">0.59/0.13</td>
1036
+ <td class="tg-0pky">0.67/0.12</td>
1037
+ </tr>
1038
+ <tr>
1039
+ <td class="tg-0pky" rowspan="2">Translation form Language</td>
1040
+ <td class="tg-0pky" rowspan="2">FLORES-200</td>
1041
+ <td class="tg-0pky">Fluency</td>
1042
+ <td class="tg-0pky">3.95/0.11</td>
1043
+ <td class="tg-0pky">3.88/0.15</td>
1044
+ <td class="tg-0pky">-- / --</td>
1045
+ <td class="tg-0pky">-- / --</td>
1046
+ <td class="tg-0pky">3.92/0.14</td>
1047
+ </tr>
1048
+ <tr>
1049
+ <td class="tg-0pky">Accuracy</td>
1050
+ <td class="tg-0pky">4.22/0.15</td>
1051
+ <td class="tg-0pky">4.25/0.21</td>
1052
+ <td class="tg-0pky">-- / --</td>
1053
+ <td class="tg-0pky">-- / --</td>
1054
+ <td class="tg-0pky">4.25/0.23</td>
1055
+ </tr>
1056
+ <tr>
1057
+ <td class="tg-0pky" rowspan="2">Translation to Language</td>
1058
+ <td class="tg-0pky" rowspan="2">FLORES-200</td>
1059
+ <td class="tg-0pky">Fluency</td>
1060
+ <td class="tg-0pky">3.92/0.11</td>
1061
+ <td class="tg-0pky">3.84/0.14</td>
1062
+ <td class="tg-0pky">-- / --</td>
1063
+ <td class="tg-0pky">-- / --</td>
1064
+ <td class="tg-0pky">4.19/0.14</td>
1065
+ </tr>
1066
+ <tr>
1067
+ <td class="tg-0pky">Accuracy</td>
1068
+ <td class="tg-0pky">4.31/0.16</td>
1069
+ <td class="tg-0pky">4.18/0.20</td>
1070
+ <td class="tg-0pky">-- / --</td>
1071
+ <td class="tg-0pky">-- / --</td>
1072
+ <td class="tg-0pky">4.63/0.15</td>
1073
+ </tr>
1074
+ </tbody></table>
1075
 
1076
  ---
1077