Add results comparison to SmolLM
Browse files
README.md
CHANGED
@@ -115,6 +115,16 @@ Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry
|
|
115 |
Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
|
116 |
|
117 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
118 |
## Limitations and Biases
|
119 |
|
120 |
While DCLM-1B demonstrates strong performance across a range of tasks, it's important to note:
|
|
|
115 |
Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
|
116 |
|
117 |
|
118 |
+
Below we compare to the recently released SmolLM (https://huggingface.co/blog/smollm) on key benchmarks. As described in the paper, Core accuracy is the average of
|
119 |
+
centered accuracy on 22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged over 53 tasks.
|
120 |
+
We evaluate the models using llm-foundry.
|
121 |
+
|
122 |
+
|
123 |
+
| Task | Core | Extended | MMLU 5-shot |
|
124 |
+
|---------|------|----------|-------------|
|
125 |
+
| DCLM-1B | 42.3 | 25.1 | 41.9 |
|
126 |
+
| SmolLM | 36.3 | 21.2 | 30.0 |
|
127 |
+
|
128 |
## Limitations and Biases
|
129 |
|
130 |
While DCLM-1B demonstrates strong performance across a range of tasks, it's important to note:
|