cjvt
/

OPT_GaMS-1B

@@ -72,7 +72,7 @@ The model was additionally pretrained on the following Slovene, English, and Cro
 | mC4 | Slovene | 5.5 B | 11.6 % |
 | MaCoCu | Slovene | 4.68 B | 9.86 % |
 | CC100 | Slovene | 0.54 B | 1.14 % |
-| Rižnica | Croatian | 0.21 B | 0.44 % |
 | Hr News | Croatian | 4.16 B | 8.77 % |
 | MaCoCu HBS | CBS | 15.65 B | 32.98 % |
 | Wikipedia | English | 4.7 B | 9.9 % |
@@ -86,7 +86,7 @@ The model was trained using NeMo framework on Slovene HPC Vega, utilizing 64 A10
 ## Evaluation
-The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All GaMS models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the two versions with finetuned in the name).
 ### SuperGLUE results
@@ -107,15 +107,29 @@ The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leade
 | Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) |	R(neutral) | F1(neutral) |	P(contradiction) |	R(contradiction) |	F1(contradiction) |
 | :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
-| OPT_GaMS-1B                |
-| GaMS-1B                    | 0.3317 | 0.3418 | 0.4327 | 0.3819 | 0.3353 | 0.5122 | 0.4053 | 0.2344 | 0.0457 | 0.0765 |
-| OPT_GaMS-1B-Chat           | 0.3447 | 0.3515 | 0.6784 | 0.4631 | 0.3386 | 0.3293 | 0.3338 | 0.2105 | 0.0122 | 0.0231 |
-| GaMS-1B-Chat               | 0.3417 | 0.3405 | 0.9737 | 0.5045 | 0.2857 | 0.0061 | 0.0119 | 0.4615 | 0.0183 | 0.0352 |
-| OPT_GaMS-1B-Chat finetuned |
-| GaMS-1B-Chat finetuned     |
-| SlovenianGPT-Chat*         | 0.4729 | 0.4399 | 0.7281 | 0.5485 | 0.3719 | 0.1372 | 0.2004 | 0.5723 | 0.5427 | 0.5571 |
-| GPT-3.5-Turbo finetuned    | 0.8567 | 0.8464 | 0.8538 | 0.8501 | 0.8041 | 0.8384 | 0.8209 | 0.9260 | 0.8780 | 0.9014 |
-| SloBERTa                   | 0.7375 | 0.8127 | 0.7105 | 0.7582 | 0.6844 | 0.7470 | 0.7143 | 0.7273 | 0.7561 | 0.7414 |
-| CroSloEngual BERT          | 0.6623 | 0.7147 | 0.6667 | 0.6899 | 0.6072 | 0.6646 | 0.6346 | 0.6719 | 0.6555 | 0.6636 |
-### Slovenian-LLM-eval results

 | mC4 | Slovene | 5.5 B | 11.6 % |
 | MaCoCu | Slovene | 4.68 B | 9.86 % |
 | CC100 | Slovene | 0.54 B | 1.14 % |
+| Riznica | Croatian | 0.21 B | 0.44 % |
 | Hr News | Croatian | 4.16 B | 8.77 % |
 | MaCoCu HBS | CBS | 15.65 B | 32.98 % |
 | Wikipedia | English | 4.7 B | 9.9 % |
 ## Evaluation
+The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
 ### SuperGLUE results
 | Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) |	R(neutral) | F1(neutral) |	P(contradiction) |	R(contradiction) |	F1(contradiction) |
 | :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
+| OPT_GaMS-1B                | 0.3277     | 0.3407     | 0.6754     | 0.4529     | 0.3538     | 0.1402     | 0.2009     | 0.2632     | 0.1524     | 0.1931     |
+| GaMS-1B                    | 0.3317     | 0.3418     | 0.4327     | 0.3819     | 0.3353     | 0.5122     | 0.4053     | 0.2344     | 0.0457     | 0.0765     |
+| OPT_GaMS-1B-Chat           | 0.3447     | 0.3515     | 0.6784     | 0.4631     | 0.3386     | 0.3293     | 0.3338     | 0.2105     | 0.0122     | 0.0231     |
+| GaMS-1B-Chat               | 0.3417     | 0.3405     | **0.9737** | 0.5045     | 0.2857     | 0.0061     | 0.0119     | 0.4615     | 0.0183     | 0.0352     |
+| OPT_GaMS-1B-Chat finetuned | 0.7244     | 0.7065     | 0.8304     | 0.7634     | 0.7269     | 0.6006     | 0.6578     | 0.7446     | 0.7378     | 0.7412     |
+| GaMS-1B-Chat finetuned     | 0.7144     | 0.8037     | 0.6345     | 0.7092     | 0.7247     | 0.6341     | 0.6764     | 0.6531     | **0.8780** | 0.7490     |
+| SlovenianGPT-Chat*         | 0.4729     | 0.4399     | 0.7281     | 0.5485     | 0.3719     | 0.1372     | 0.2004     | 0.5723     | 0.5427     | 0.5571     |
+| GPT-3.5-Turbo finetuned    | **0.8567** | **0.8464** | 0.8538     | **0.8501** | **0.8041** | **0.8384** | **0.8209** | **0.9260** | **0.8780** | **0.9014** |
+| SloBERTa                   | 0.7375     | 0.8127     | 0.7105     | 0.7582     | 0.6844     | 0.7470     | 0.7143     | 0.7273     | 0.7561     | 0.7414     |
+| CroSloEngual BERT          | 0.6623     | 0.7147     | 0.6667     | 0.6899     | 0.6072     | 0.6646     | 0.6346     | 0.6719     | 0.6555     | 0.6636     |
+*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
+### Slovenian-LLM-eval results
+| Model | ARC-Challenge Accuracy | ARC-Easy Accuracy | BoolQ Accuracy | HellaSwag Accuracy | NQ-Open EM | OpenBookQA Accuracy | PIQA Accuracy | WinoGrande Accuracy |
+| :---- | :--------------------: | :---------------: | :------------: | :----------------: | :--------------: | :-----------------: | :-----------: | :-----------------: |
+| OPT_GaMS-1B        | 0.2227 &plusmn; 0.0122     | 0.436  &plusmn; 0.0102     | 0.378  &plusmn; 0.0085     | 0.3394 &plusmn; 0.0047     | 0.0003 &plusmn; 0.0003     | 0.214 &plusmn; 0.0184     | 0.6083 &plusmn; 0.0114     | 0.5533 &plusmn; 0.014      |
+| GaMS-1B            | 0.2329 &plusmn; 0.0124     | 0.4743 &plusmn; 0.0102     | 0.3813 &plusmn; 0.0085     | 0.3555 &plusmn; 0.0048     | 0.0036 &plusmn; 0.001      | 0.22  &plusmn; 0.0185     | 0.624  &plusmn; 0.0113     | 0.532  &plusmn; 0.014      |
+| OPT_GaMS-1B-Chat   | 0.2355 &plusmn; 0.0124     | 0.3960 &plusmn; 0.0100     | 0.4398 &plusmn; 0.0087     | 0.3459 &plusmn; 0.0047     | 0.0011 &plusmn; 0.0006     | 0.20  &plusmn; 0.0179     | 0.5778 &plusmn; 0.0115     | 0.5359 &plusmn; 0.014      |
+| GaMS-1B-Chat       | 0.2517 &plusmn; 0.0127     | 0.4394 &plusmn; 0.0102     | 0.4502 &plusmn; 0.0087     | 0.3634 &plusmn; 0.0048     | 0      &plusmn; 0          | 0.196 &plusmn; 0.0178     | 0.6115 &plusmn; 0.0114     | 0.5572 &plusmn; 0.014      |
+| YugoGPT            | 0.2961 &plusmn; 0.0133     | 0.4781 &plusmn; 0.0102     | 0.3783 &plusmn; 0.0085     | 0.3890 &plusmn; 0.0047     | 0.0385 &plusmn; 0.0032     | 0.226 &plusmn; 0.0187     | 0.5816 &plusmn; 0.0115     | 0.5588 &plusmn; 0.014      |
+| SlovenianGPT       | **0.3805 &plusmn; 0.0142** | **0.6498 &plusmn; 0.0098** | 0.4523 &plusmn; 0.0087     | **0.4935 &plusmn; 0.0050** | **0.0432 &plusmn; 0.0034** | **0.27  &plusmn; 0.0199** | **0.6937 &plusmn; 0.0108** | **0.644  &plusmn; 0.0135** |
+| SlovenianGPT-Chat* | 0.3567 &plusmn; 0.014      | 0.5901 &plusmn; 0.0101     | **0.4706 &plusmn; 0.0087** | 0.4719 &plusmn; 0.0050     | 0.0003 &plusmn; 0.0003     | **0.27  &plusmn; 0.0199** | 0.6861 &plusmn; 0.0108     | 0.6425 &plusmn; 0.0135     |
+*SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.