flair
/

bueble-lm-2b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

pdelobelle commited on Dec 4, 2024

Commit

dfb2efc

•

1 Parent(s): 607f97c

Update README.md

Files changed (1) hide show

README.md +70 -0

README.md CHANGED Viewed

@@ -56,6 +56,76 @@ Key improvements over Gemma-2B baseline:
 Consistently outperforms both the base Gemma-2B and other German models like LLaMmlein-1B across most tasks.
 ## Safety & Ethics
 ### Toxicity

 Consistently outperforms both the base Gemma-2B and other German models like LLaMmlein-1B across most tasks.
+<table class="model-comparison">
+  <thead>
+    <tr>
+      <th align="left">Model</th>
+      <th align="center" colspan="2">ARC-DE</th>
+      <th align="center" colspan="2">HellaSwag-DE</th>
+      <th align="center">TruthfulQA-DE</th>
+      <th align="center">Average</th>
+    </tr>
+    <tr>
+      <th></th>
+      <th align="center">0-shot</th>
+      <th align="center">3-shot</th>
+      <th align="center">0-shot</th>
+      <th align="center">3-shot</th>
+      <th align="center">0-shot</th>
+      <th align="center">0-shot</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>Gemma-2-2B</td>
+      <td align="center">22.9</td>
+      <td align="center">23.1</td>
+      <td align="center">28.0</td>
+      <td align="center">27.6</td>
+      <td align="center">25.5</td>
+      <td align="center">25.5</td>
+    </tr>
+    <tr>
+      <td>LLaMmlein-120M</td>
+      <td align="center">24.7 ↑+8%</td>
+      <td align="center">-</td>
+      <td align="center">32.0 ↑+14%</td>
+      <td align="center">-</td>
+      <td align="center">25.0 ↓-2%</td>
+      <td align="center">27.2 ↑+7%</td>
+    </tr>
+    <tr>
+      <td>LLaMmlein-1B</td>
+      <td align="center">30.0 ↑+31%</td>
+      <td align="center">-</td>
+      <td align="center"><strong>48.5</strong> ↑+73%</td>
+      <td align="center">-</td>
+      <td align="center">23.4 ↓-8%</td>
+      <td align="center">34.0 ↑+33%</td>
+    </tr>
+    <tr>
+      <td>Sauerkraut-Gemma-2B</td>
+      <td align="center">28.0 ↑+22%</td>
+      <td align="center">34.6 ↑+50%</td>
+      <td align="center">37.2 ↑+33%</td>
+      <td align="center">44.1 ↑+60%</td>
+      <td align="center"><strong>32.9</strong> ↑+29%</td>
+      <td align="center">32.7 ↑+28%</td>
+    </tr>
+    <tr>
+      <td><strong>BübleLM (Ours)</strong></td>
+      <td align="center"><strong>32.3</strong> ↑+41%</td>
+      <td align="center"><strong>35.2</strong> ↑+52%</td>
+      <td align="center">47.9 ↑+71%</td>
+      <td align="center"><strong>46.6</strong> ↑+69%</td>
+      <td align="center">27.2 ↑+7%</td>
+      <td align="center"><strong>35.8</strong> ↑+40%</td>
+    </tr>
+  </tbody>
+</table>
+*Performance evaluated on German versions of ARC (knowledge-based QA), HellaSwag (commonsense reasoning), and TruthfulQA (truthfulness). Values show accuracy in percentages, with arrows indicating relative improvement over Gemma-2B baseline. Best results shown in bold.*
 ## Safety & Ethics
 ### Toxicity