THIS MODEL IS EXPERIMENTAL AND MIGHT BE BUGGY, I DIDN'T PERFECT THE STRENGTH OF DPO AND SFT YET.
Yi-34B-200K trained via DPO on RAWrr_v1 at ctx 200 (lora_r 4, lora_alpha 8) and then via SFT at ctx 1400 (lora_r 16, lora_alpha 32) on AEZAKMI_v2. It's less prone to refusals than Yi-34B-200K-AEZAKMI-v2 but that's work in progress still - I want to do DPO with higher lora rank and ctx and then repeat SFT training. I haven't tested it too much, but on what I've seen, it's a good model.
If you want to re-produce this model by merging loras, start by downloading Yi-34B-200K-Llamafied.
Then merge it with https://huggingface.co/adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r2
Then merge the resulting model with https://huggingface.co/adamo1139/yi-34b-200k-aezakmi-v2-rawrr-v1-run1-experimental-LoRA
License: apache-2.0
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 71.04 |
AI2 Reasoning Challenge (25-Shot) | 66.81 |
HellaSwag (10-Shot) | 85.79 |
MMLU (5-Shot) | 75.44 |
TruthfulQA (0-shot) | 57.91 |
Winogrande (5-shot) | 80.35 |
GSM8k (5-shot) | 59.97 |
- Downloads last month
- 1,211
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard66.810
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard85.790
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard75.440
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard57.910
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard80.350
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard59.970