Update README.md
Browse files
README.md
CHANGED
@@ -72,7 +72,7 @@ The model was additionally pretrained on the following Slovene, English, and Cro
|
|
72 |
| mC4 | Slovene | 5.5 B | 11.6 % |
|
73 |
| MaCoCu | Slovene | 4.68 B | 9.86 % |
|
74 |
| CC100 | Slovene | 0.54 B | 1.14 % |
|
75 |
-
|
|
76 |
| Hr News | Croatian | 4.16 B | 8.77 % |
|
77 |
| MaCoCu HBS | CBS | 15.65 B | 32.98 % |
|
78 |
| Wikipedia | English | 4.7 B | 9.9 % |
|
@@ -86,7 +86,7 @@ The model was trained using NeMo framework on Slovene HPC Vega, utilizing 64 A10
|
|
86 |
|
87 |
## Evaluation
|
88 |
|
89 |
-
The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All
|
90 |
|
91 |
### SuperGLUE results
|
92 |
|
@@ -107,15 +107,29 @@ The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leade
|
|
107 |
|
108 |
| Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) | R(neutral) | F1(neutral) | P(contradiction) | R(contradiction) | F1(contradiction) |
|
109 |
| :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
|
110 |
-
| OPT_GaMS-1B |
|
111 |
-
| GaMS-1B | 0.3317
|
112 |
-
| OPT_GaMS-1B-Chat | 0.3447
|
113 |
-
| GaMS-1B-Chat | 0.3417
|
114 |
-
| OPT_GaMS-1B-Chat finetuned |
|
115 |
-
| GaMS-1B-Chat finetuned |
|
116 |
-
| SlovenianGPT-Chat* | 0.4729
|
117 |
-
| GPT-3.5-Turbo finetuned | 0.8567 | 0.8464 | 0.8538
|
118 |
-
| SloBERTa | 0.7375
|
119 |
-
| CroSloEngual BERT | 0.6623
|
120 |
-
|
121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
| mC4 | Slovene | 5.5 B | 11.6 % |
|
73 |
| MaCoCu | Slovene | 4.68 B | 9.86 % |
|
74 |
| CC100 | Slovene | 0.54 B | 1.14 % |
|
75 |
+
| Riznica | Croatian | 0.21 B | 0.44 % |
|
76 |
| Hr News | Croatian | 4.16 B | 8.77 % |
|
77 |
| MaCoCu HBS | CBS | 15.65 B | 32.98 % |
|
78 |
| Wikipedia | English | 4.7 B | 9.9 % |
|
|
|
86 |
|
87 |
## Evaluation
|
88 |
|
89 |
+
The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
|
90 |
|
91 |
### SuperGLUE results
|
92 |
|
|
|
107 |
|
108 |
| Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) | R(neutral) | F1(neutral) | P(contradiction) | R(contradiction) | F1(contradiction) |
|
109 |
| :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
|
110 |
+
| OPT_GaMS-1B | 0.3277 | 0.3407 | 0.6754 | 0.4529 | 0.3538 | 0.1402 | 0.2009 | 0.2632 | 0.1524 | 0.1931 |
|
111 |
+
| GaMS-1B | 0.3317 | 0.3418 | 0.4327 | 0.3819 | 0.3353 | 0.5122 | 0.4053 | 0.2344 | 0.0457 | 0.0765 |
|
112 |
+
| OPT_GaMS-1B-Chat | 0.3447 | 0.3515 | 0.6784 | 0.4631 | 0.3386 | 0.3293 | 0.3338 | 0.2105 | 0.0122 | 0.0231 |
|
113 |
+
| GaMS-1B-Chat | 0.3417 | 0.3405 | **0.9737** | 0.5045 | 0.2857 | 0.0061 | 0.0119 | 0.4615 | 0.0183 | 0.0352 |
|
114 |
+
| OPT_GaMS-1B-Chat finetuned | 0.7244 | 0.7065 | 0.8304 | 0.7634 | 0.7269 | 0.6006 | 0.6578 | 0.7446 | 0.7378 | 0.7412 |
|
115 |
+
| GaMS-1B-Chat finetuned | 0.7144 | 0.8037 | 0.6345 | 0.7092 | 0.7247 | 0.6341 | 0.6764 | 0.6531 | **0.8780** | 0.7490 |
|
116 |
+
| SlovenianGPT-Chat* | 0.4729 | 0.4399 | 0.7281 | 0.5485 | 0.3719 | 0.1372 | 0.2004 | 0.5723 | 0.5427 | 0.5571 |
|
117 |
+
| GPT-3.5-Turbo finetuned | **0.8567** | **0.8464** | 0.8538 | **0.8501** | **0.8041** | **0.8384** | **0.8209** | **0.9260** | **0.8780** | **0.9014** |
|
118 |
+
| SloBERTa | 0.7375 | 0.8127 | 0.7105 | 0.7582 | 0.6844 | 0.7470 | 0.7143 | 0.7273 | 0.7561 | 0.7414 |
|
119 |
+
| CroSloEngual BERT | 0.6623 | 0.7147 | 0.6667 | 0.6899 | 0.6072 | 0.6646 | 0.6346 | 0.6719 | 0.6555 | 0.6636 |
|
120 |
+
|
121 |
+
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
|
122 |
+
|
123 |
+
### Slovenian-LLM-eval results
|
124 |
+
|
125 |
+
| Model | ARC-Challenge Accuracy | ARC-Easy Accuracy | BoolQ Accuracy | HellaSwag Accuracy | NQ-Open EM | OpenBookQA Accuracy | PIQA Accuracy | WinoGrande Accuracy |
|
126 |
+
| :---- | :--------------------: | :---------------: | :------------: | :----------------: | :--------------: | :-----------------: | :-----------: | :-----------------: |
|
127 |
+
| OPT_GaMS-1B | 0.2227 ± 0.0122 | 0.436 ± 0.0102 | 0.378 ± 0.0085 | 0.3394 ± 0.0047 | 0.0003 ± 0.0003 | 0.214 ± 0.0184 | 0.6083 ± 0.0114 | 0.5533 ± 0.014 |
|
128 |
+
| GaMS-1B | 0.2329 ± 0.0124 | 0.4743 ± 0.0102 | 0.3813 ± 0.0085 | 0.3555 ± 0.0048 | 0.0036 ± 0.001 | 0.22 ± 0.0185 | 0.624 ± 0.0113 | 0.532 ± 0.014 |
|
129 |
+
| OPT_GaMS-1B-Chat | 0.2355 ± 0.0124 | 0.3960 ± 0.0100 | 0.4398 ± 0.0087 | 0.3459 ± 0.0047 | 0.0011 ± 0.0006 | 0.20 ± 0.0179 | 0.5778 ± 0.0115 | 0.5359 ± 0.014 |
|
130 |
+
| GaMS-1B-Chat | 0.2517 ± 0.0127 | 0.4394 ± 0.0102 | 0.4502 ± 0.0087 | 0.3634 ± 0.0048 | 0 ± 0 | 0.196 ± 0.0178 | 0.6115 ± 0.0114 | 0.5572 ± 0.014 |
|
131 |
+
| YugoGPT | 0.2961 ± 0.0133 | 0.4781 ± 0.0102 | 0.3783 ± 0.0085 | 0.3890 ± 0.0047 | 0.0385 ± 0.0032 | 0.226 ± 0.0187 | 0.5816 ± 0.0115 | 0.5588 ± 0.014 |
|
132 |
+
| SlovenianGPT | **0.3805 ± 0.0142** | **0.6498 ± 0.0098** | 0.4523 ± 0.0087 | **0.4935 ± 0.0050** | **0.0432 ± 0.0034** | **0.27 ± 0.0199** | **0.6937 ± 0.0108** | **0.644 ± 0.0135** |
|
133 |
+
| SlovenianGPT-Chat* | 0.3567 ± 0.014 | 0.5901 ± 0.0101 | **0.4706 ± 0.0087** | 0.4719 ± 0.0050 | 0.0003 ± 0.0003 | **0.27 ± 0.0199** | 0.6861 ± 0.0108 | 0.6425 ± 0.0135 |
|
134 |
+
|
135 |
+
*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
|