Gemma 7B Zephyr DPO
The Zephyr DPO recipe applied on top of SFT finetuned Gemma 7B
Model description
- Model type: A 8.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
- Language(s) (NLP): Primarily English
- Finetuned from model: wandb/gemma-7b-zephyr-sft
Recipe
We trained using the DPO script in alignment handbook recipe and logging to W&B
Visit the W&B workspace here
License
This model has the same license as the original Gemma model collection
Compute provided by Lambda Labs - 8xA100 80GB node
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 61.62 |
AI2 Reasoning Challenge (25-Shot) | 60.84 |
HellaSwag (10-Shot) | 80.44 |
MMLU (5-Shot) | 60.60 |
TruthfulQA (0-shot) | 42.48 |
Winogrande (5-shot) | 75.37 |
GSM8k (5-shot) | 49.96 |
- Downloads last month
- 23
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for wandb/gemma-7b-zephyr-dpo
Dataset used to train wandb/gemma-7b-zephyr-dpo
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard60.840
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard80.440
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard60.600
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard42.480
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard75.370
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard49.960