palmer turbo
This model has a slightly different architecture and training style:
- The model was followed by a continual pretraining (lm_head + embedding layers were tuned).
- Base model was pretrained on 75k instruction/response pairs and merged.
- Similar architecture than palmer series but smaller in context size (8192)
In short, palmer is now half the size, twice the speed and almost same overall performance with a notable improvement on mmlu and arc challenge instead of winogrande. As of Wed 17 Jul, it beats all models =< 0.5b on hellaswag.
As all palmer models, the model is biased to respond to answers without using any specific prompt, feel free to further fine-tune it for your specific use case.
benchmarks
These are zero-shot evaluations performed on current state-of-the-art language models.
Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average |
---|---|---|---|---|---|---|
smollm-360m | 0.2537 | 0.3626 | 0.5350 | 0.7116 | 0.5659 | 0.4858 |
tinyllama | 0.2577 | 0.3029 | 0.5935 | 0.7329 | 0.5959 | 0.4966 |
qwen2-0.5b | 0.4413 | 0.2892 | 0.4905 | 0.6931 | 0.5699 | 0.4968 |
danube3-500m-chat (current sota) | 0.2554 | 0.3626 | 0.6072 | 0.7432 | 0.6140 | 0.5164 |
palmer-004-turbo | 0.2736 | 0.3558 | 0.6179 | 0.7367 | 0.6117 | 0.5191 |
palmer-004 | 0.2661 | 0.3490 | 0.6173 | 0.7481 | 0.6417 | 0.5244 |
thanks to
- h2oai: performant base model provider
- teknium: openhermes dataset provider
- unsloth: tooling for training software
- Downloads last month
- 68
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.