pdelobelle commited on
Commit
dfb2efc
β€’
1 Parent(s): 607f97c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md CHANGED
@@ -56,6 +56,76 @@ Key improvements over Gemma-2B baseline:
56
 
57
  Consistently outperforms both the base Gemma-2B and other German models like LLaMmlein-1B across most tasks.
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ## Safety & Ethics
60
 
61
  ### Toxicity
 
56
 
57
  Consistently outperforms both the base Gemma-2B and other German models like LLaMmlein-1B across most tasks.
58
 
59
+ <table class="model-comparison">
60
+ <thead>
61
+ <tr>
62
+ <th align="left">Model</th>
63
+ <th align="center" colspan="2">ARC-DE</th>
64
+ <th align="center" colspan="2">HellaSwag-DE</th>
65
+ <th align="center">TruthfulQA-DE</th>
66
+ <th align="center">Average</th>
67
+ </tr>
68
+ <tr>
69
+ <th></th>
70
+ <th align="center">0-shot</th>
71
+ <th align="center">3-shot</th>
72
+ <th align="center">0-shot</th>
73
+ <th align="center">3-shot</th>
74
+ <th align="center">0-shot</th>
75
+ <th align="center">0-shot</th>
76
+ </tr>
77
+ </thead>
78
+ <tbody>
79
+ <tr>
80
+ <td>Gemma-2-2B</td>
81
+ <td align="center">22.9</td>
82
+ <td align="center">23.1</td>
83
+ <td align="center">28.0</td>
84
+ <td align="center">27.6</td>
85
+ <td align="center">25.5</td>
86
+ <td align="center">25.5</td>
87
+ </tr>
88
+ <tr>
89
+ <td>LLaMmlein-120M</td>
90
+ <td align="center">24.7 ↑+8%</td>
91
+ <td align="center">-</td>
92
+ <td align="center">32.0 ↑+14%</td>
93
+ <td align="center">-</td>
94
+ <td align="center">25.0 ↓-2%</td>
95
+ <td align="center">27.2 ↑+7%</td>
96
+ </tr>
97
+ <tr>
98
+ <td>LLaMmlein-1B</td>
99
+ <td align="center">30.0 ↑+31%</td>
100
+ <td align="center">-</td>
101
+ <td align="center"><strong>48.5</strong> ↑+73%</td>
102
+ <td align="center">-</td>
103
+ <td align="center">23.4 ↓-8%</td>
104
+ <td align="center">34.0 ↑+33%</td>
105
+ </tr>
106
+ <tr>
107
+ <td>Sauerkraut-Gemma-2B</td>
108
+ <td align="center">28.0 ↑+22%</td>
109
+ <td align="center">34.6 ↑+50%</td>
110
+ <td align="center">37.2 ↑+33%</td>
111
+ <td align="center">44.1 ↑+60%</td>
112
+ <td align="center"><strong>32.9</strong> ↑+29%</td>
113
+ <td align="center">32.7 ↑+28%</td>
114
+ </tr>
115
+ <tr>
116
+ <td><strong>BΓΌbleLM (Ours)</strong></td>
117
+ <td align="center"><strong>32.3</strong> ↑+41%</td>
118
+ <td align="center"><strong>35.2</strong> ↑+52%</td>
119
+ <td align="center">47.9 ↑+71%</td>
120
+ <td align="center"><strong>46.6</strong> ↑+69%</td>
121
+ <td align="center">27.2 ↑+7%</td>
122
+ <td align="center"><strong>35.8</strong> ↑+40%</td>
123
+ </tr>
124
+ </tbody>
125
+ </table>
126
+
127
+ *Performance evaluated on German versions of ARC (knowledge-based QA), HellaSwag (commonsense reasoning), and TruthfulQA (truthfulness). Values show accuracy in percentages, with arrows indicating relative improvement over Gemma-2B baseline. Best results shown in bold.*
128
+
129
  ## Safety & Ethics
130
 
131
  ### Toxicity