achal-tri commited on
Commit
6a413ee
1 Parent(s): 22f569d

Add results comparison to SmolLM

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -115,6 +115,16 @@ Here are the evaluation results for DCLM-1B on various tasks (using [llm-foundry
115
  Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
116
 
117
 
 
 
 
 
 
 
 
 
 
 
118
  ## Limitations and Biases
119
 
120
  While DCLM-1B demonstrates strong performance across a range of tasks, it's important to note:
 
115
  Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
116
 
117
 
118
+ Below we compare to the recently released SmolLM (https://huggingface.co/blog/smollm) on key benchmarks. As described in the paper, Core accuracy is the average of
119
+ centered accuracy on 22 tasks (including HellaSwag and ARC-E), Extended is centered accuracy averaged over 53 tasks.
120
+ We evaluate the models using llm-foundry.
121
+
122
+
123
+ | Task | Core | Extended | MMLU 5-shot |
124
+ |---------|------|----------|-------------|
125
+ | DCLM-1B | 42.3 | 25.1 | 41.9 |
126
+ | SmolLM | 36.3 | 21.2 | 30.0 |
127
+
128
  ## Limitations and Biases
129
 
130
  While DCLM-1B demonstrates strong performance across a range of tasks, it's important to note: