cjvt
/

dvres commited on
Commit
c7504a8
·
verified ·
1 Parent(s): eeaa163

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -14
README.md CHANGED
@@ -72,7 +72,7 @@ The model was additionally pretrained on the following Slovene, English, and Cro
72
  | mC4 | Slovene | 5.5 B | 11.6 % |
73
  | MaCoCu | Slovene | 4.68 B | 9.86 % |
74
  | CC100 | Slovene | 0.54 B | 1.14 % |
75
- | Rižnica | Croatian | 0.21 B | 0.44 % |
76
  | Hr News | Croatian | 4.16 B | 8.77 % |
77
  | MaCoCu HBS | CBS | 15.65 B | 32.98 % |
78
  | Wikipedia | English | 4.7 B | 9.9 % |
@@ -86,7 +86,7 @@ The model was trained using NeMo framework on Slovene HPC Vega, utilizing 64 A10
86
 
87
  ## Evaluation
88
 
89
- The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All GaMS models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the two versions with finetuned in the name).
90
 
91
  ### SuperGLUE results
92
 
@@ -107,15 +107,29 @@ The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leade
107
 
108
  | Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) | R(neutral) | F1(neutral) | P(contradiction) | R(contradiction) | F1(contradiction) |
109
  | :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
110
- | OPT_GaMS-1B |
111
- | GaMS-1B | 0.3317 | 0.3418 | 0.4327 | 0.3819 | 0.3353 | 0.5122 | 0.4053 | 0.2344 | 0.0457 | 0.0765 |
112
- | OPT_GaMS-1B-Chat | 0.3447 | 0.3515 | 0.6784 | 0.4631 | 0.3386 | 0.3293 | 0.3338 | 0.2105 | 0.0122 | 0.0231 |
113
- | GaMS-1B-Chat | 0.3417 | 0.3405 | 0.9737 | 0.5045 | 0.2857 | 0.0061 | 0.0119 | 0.4615 | 0.0183 | 0.0352 |
114
- | OPT_GaMS-1B-Chat finetuned |
115
- | GaMS-1B-Chat finetuned |
116
- | SlovenianGPT-Chat* | 0.4729 | 0.4399 | 0.7281 | 0.5485 | 0.3719 | 0.1372 | 0.2004 | 0.5723 | 0.5427 | 0.5571 |
117
- | GPT-3.5-Turbo finetuned | 0.8567 | 0.8464 | 0.8538 | 0.8501 | 0.8041 | 0.8384 | 0.8209 | 0.9260 | 0.8780 | 0.9014 |
118
- | SloBERTa | 0.7375 | 0.8127 | 0.7105 | 0.7582 | 0.6844 | 0.7470 | 0.7143 | 0.7273 | 0.7561 | 0.7414 |
119
- | CroSloEngual BERT | 0.6623 | 0.7147 | 0.6667 | 0.6899 | 0.6072 | 0.6646 | 0.6346 | 0.6719 | 0.6555 | 0.6636 |
120
-
121
- ### Slovenian-LLM-eval results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  | mC4 | Slovene | 5.5 B | 11.6 % |
73
  | MaCoCu | Slovene | 4.68 B | 9.86 % |
74
  | CC100 | Slovene | 0.54 B | 1.14 % |
75
+ | Riznica | Croatian | 0.21 B | 0.44 % |
76
  | Hr News | Croatian | 4.16 B | 8.77 % |
77
  | MaCoCu HBS | CBS | 15.65 B | 32.98 % |
78
  | Wikipedia | English | 4.7 B | 9.9 % |
 
86
 
87
  ## Evaluation
88
 
89
+ The model was evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models was evaluated on imporved version of Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
90
 
91
  ### SuperGLUE results
92
 
 
107
 
108
  | Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) | R(neutral) | F1(neutral) | P(contradiction) | R(contradiction) | F1(contradiction) |
109
  | :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
110
+ | OPT_GaMS-1B | 0.3277 | 0.3407 | 0.6754 | 0.4529 | 0.3538 | 0.1402 | 0.2009 | 0.2632 | 0.1524 | 0.1931 |
111
+ | GaMS-1B | 0.3317 | 0.3418 | 0.4327 | 0.3819 | 0.3353 | 0.5122 | 0.4053 | 0.2344 | 0.0457 | 0.0765 |
112
+ | OPT_GaMS-1B-Chat | 0.3447 | 0.3515 | 0.6784 | 0.4631 | 0.3386 | 0.3293 | 0.3338 | 0.2105 | 0.0122 | 0.0231 |
113
+ | GaMS-1B-Chat | 0.3417 | 0.3405 | **0.9737** | 0.5045 | 0.2857 | 0.0061 | 0.0119 | 0.4615 | 0.0183 | 0.0352 |
114
+ | OPT_GaMS-1B-Chat finetuned | 0.7244 | 0.7065 | 0.8304 | 0.7634 | 0.7269 | 0.6006 | 0.6578 | 0.7446 | 0.7378 | 0.7412 |
115
+ | GaMS-1B-Chat finetuned | 0.7144 | 0.8037 | 0.6345 | 0.7092 | 0.7247 | 0.6341 | 0.6764 | 0.6531 | **0.8780** | 0.7490 |
116
+ | SlovenianGPT-Chat* | 0.4729 | 0.4399 | 0.7281 | 0.5485 | 0.3719 | 0.1372 | 0.2004 | 0.5723 | 0.5427 | 0.5571 |
117
+ | GPT-3.5-Turbo finetuned | **0.8567** | **0.8464** | 0.8538 | **0.8501** | **0.8041** | **0.8384** | **0.8209** | **0.9260** | **0.8780** | **0.9014** |
118
+ | SloBERTa | 0.7375 | 0.8127 | 0.7105 | 0.7582 | 0.6844 | 0.7470 | 0.7143 | 0.7273 | 0.7561 | 0.7414 |
119
+ | CroSloEngual BERT | 0.6623 | 0.7147 | 0.6667 | 0.6899 | 0.6072 | 0.6646 | 0.6346 | 0.6719 | 0.6555 | 0.6636 |
120
+
121
+ *SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
122
+
123
+ ### Slovenian-LLM-eval results
124
+
125
+ | Model | ARC-Challenge Accuracy | ARC-Easy Accuracy | BoolQ Accuracy | HellaSwag Accuracy | NQ-Open EM | OpenBookQA Accuracy | PIQA Accuracy | WinoGrande Accuracy |
126
+ | :---- | :--------------------: | :---------------: | :------------: | :----------------: | :--------------: | :-----------------: | :-----------: | :-----------------: |
127
+ | OPT_GaMS-1B | 0.2227 ± 0.0122 | 0.436 ± 0.0102 | 0.378 ± 0.0085 | 0.3394 ± 0.0047 | 0.0003 ± 0.0003 | 0.214 ± 0.0184 | 0.6083 ± 0.0114 | 0.5533 ± 0.014 |
128
+ | GaMS-1B | 0.2329 ± 0.0124 | 0.4743 ± 0.0102 | 0.3813 ± 0.0085 | 0.3555 ± 0.0048 | 0.0036 ± 0.001 | 0.22 ± 0.0185 | 0.624 ± 0.0113 | 0.532 ± 0.014 |
129
+ | OPT_GaMS-1B-Chat | 0.2355 ± 0.0124 | 0.3960 ± 0.0100 | 0.4398 ± 0.0087 | 0.3459 ± 0.0047 | 0.0011 ± 0.0006 | 0.20 ± 0.0179 | 0.5778 ± 0.0115 | 0.5359 ± 0.014 |
130
+ | GaMS-1B-Chat | 0.2517 ± 0.0127 | 0.4394 ± 0.0102 | 0.4502 ± 0.0087 | 0.3634 ± 0.0048 | 0 ± 0 | 0.196 ± 0.0178 | 0.6115 ± 0.0114 | 0.5572 ± 0.014 |
131
+ | YugoGPT | 0.2961 ± 0.0133 | 0.4781 ± 0.0102 | 0.3783 ± 0.0085 | 0.3890 ± 0.0047 | 0.0385 ± 0.0032 | 0.226 ± 0.0187 | 0.5816 ± 0.0115 | 0.5588 ± 0.014 |
132
+ | SlovenianGPT | **0.3805 ± 0.0142** | **0.6498 ± 0.0098** | 0.4523 ± 0.0087 | **0.4935 ± 0.0050** | **0.0432 ± 0.0034** | **0.27 ± 0.0199** | **0.6937 ± 0.0108** | **0.644 ± 0.0135** |
133
+ | SlovenianGPT-Chat* | 0.3567 ± 0.014 | 0.5901 ± 0.0101 | **0.4706 ± 0.0087** | 0.4719 ± 0.0050 | 0.0003 ± 0.0003 | **0.27 ± 0.0199** | 0.6861 ± 0.0108 | 0.6425 ± 0.0135 |
134
+
135
+ *SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.