Model Card for Gemma 2B Zephyr DPO

We trained the google/gemma-2b with DPO and data from argilla/dpo-mix-7k. We carefully selected the hyper-parameters to achieve the best DPO performance.

Model description

Model type: A 2.5B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
Language(s) (NLP): Primarily English
License: Gemma Terms of Use
Finetuned from model: google/gemma-2b

License

This model has the same license as the original Gemma model collection

OpenLLM Leaderboard Performance

Models	Avg.	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8k
google/gemma-2b	46.37	48.38	71.77	41.77	33.08	66.77	16.91
google/gemma-2b-it	42.75	43.94	62.70	37.65	45.82	60.93	5.46
wandb/gemma-2b-zephyr-sft	47.18	49.74	72.38	41.37	34.42	66.93	18.27
wandb/gemma-2b-zephyr-dpo	46.92	49.66	72.23	41.13	34.47	66.54	17.51
Columbia-NLP/gemma-2b-zephyr-sft	48.75	51.80	72.63	42.20	41.96	63.85	20.09
Columbia-NLP/gemma-2b-zephyr-dpo	49.14	52.22	73.11	42.55	42.64	64.40	19.94

MT-Bench

We evaluate our model with GPT-4-0125-preview as the judge.

Model	Total	Coding	Extraction	Humanities	Math	Reasoning	Roleplay	STEM	Writing
google/gemma-2b-it	4.71	2.95	4.35	6.15	2.90	3.50	5.60	5.50	6.70
wandb/gemma-2b-zephyr-sft	4.03	3.10	3.15	5.00	2.70	2.65	5.10	4.80	5.75
wandb/gemma-2b-zephyr-dpo	4.06	2.80	2.90	5.55	2.65	2.70	5.20	4.80	5.85
anakin87_gemma-2b-orpo	4.14	3.00	3.70	6.30	2.70	2.35	5.68	4.75	4.75
Columbia-NLP/gemma-2b-zephyr-sft	4.34	3.10	3.70	6.25	2.65	2.70	5.55	5.25	5.50
Columbia-NLP/gemma-2b-zephyr-dpo	4.75	3.50	4.05	6.75	3.30	3.70	5.85	5.40	5.53

Columbia-NLP
/

gemma-2b-zephyr-dpo

Model Card for Gemma 2B Zephyr DPO

Model description

License

OpenLLM Leaderboard Performance

MT-Bench

Model tree for Columbia-NLP/gemma-2b-zephyr-dpo

Dataset used to train Columbia-NLP/gemma-2b-zephyr-dpo

Evaluation results