pdelobelle
commited on
Commit
•
f486b26
1
Parent(s):
796a071
Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ license: apache-2.0
|
|
19 |
<p><em>A small German LM</em></p>
|
20 |
</div>
|
21 |
|
22 |
-
BübleLM is a German language model based on Gemma-2B, adapted using [trans-tokenization](https://pieter.ai/trans-tokenization/) with a custom German SentencePiece tokenizer. The model demonstrates how language-specific tokenization can significantly improve performance while maintaining the base model's capabilities.
|
23 |
|
24 |
## Model Details
|
25 |
|
@@ -47,12 +47,12 @@ Data sampling weights:
|
|
47 |
|
48 |
## Performance
|
49 |
|
50 |
-
Key improvements over Gemma-2B baseline:
|
51 |
- HellaSwag-DE: +71% (47.9% vs 28.0%)
|
52 |
- ARC-DE: +41% (32.3% vs 22.9%)
|
53 |
- Average zero-shot: +40% (35.8% vs 25.5%)
|
54 |
|
55 |
-
→ BübleLM-2B onsistently outperforms both the base Gemma-2B and other German models like LLaMmlein-1B across most tasks.
|
56 |
|
57 |
<table class="model-comparison">
|
58 |
<thead>
|
|
|
19 |
<p><em>A small German LM</em></p>
|
20 |
</div>
|
21 |
|
22 |
+
BübleLM is a German language model based on Gemma-2-2B, adapted using [trans-tokenization](https://pieter.ai/trans-tokenization/) with a custom German SentencePiece tokenizer. The model demonstrates how language-specific tokenization can significantly improve performance while maintaining the base model's capabilities.
|
23 |
|
24 |
## Model Details
|
25 |
|
|
|
47 |
|
48 |
## Performance
|
49 |
|
50 |
+
Key improvements over Gemma-2-2B baseline:
|
51 |
- HellaSwag-DE: +71% (47.9% vs 28.0%)
|
52 |
- ARC-DE: +41% (32.3% vs 22.9%)
|
53 |
- Average zero-shot: +40% (35.8% vs 25.5%)
|
54 |
|
55 |
+
→ BübleLM-2B onsistently outperforms both the base Gemma-2-2B and other German models like LLaMmlein-1B across most tasks.
|
56 |
|
57 |
<table class="model-comparison">
|
58 |
<thead>
|