Edit model card

THIS MODEL IS EXPERIMENTAL AND MIGHT BE BUGGY, I DIDN'T PERFECT THE STRENGTH OF DPO AND SFT YET.

Yi-34B-200K trained via DPO on RAWrr_v1 at ctx 200 (lora_r 4, lora_alpha 8) and then via SFT at ctx 1400 (lora_r 16, lora_alpha 32) on AEZAKMI_v2. It's less prone to refusals than Yi-34B-200K-AEZAKMI-v2 but that's work in progress still - I want to do DPO with higher lora rank and ctx and then repeat SFT training. I haven't tested it too much, but on what I've seen, it's a good model.

If you want to re-produce this model by merging loras, start by downloading Yi-34B-200K-Llamafied.
Then merge it with https://huggingface.co/adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r2
Then merge the resulting model with https://huggingface.co/adamo1139/yi-34b-200k-aezakmi-v2-rawrr-v1-run1-experimental-LoRA

License: apache-2.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 71.04
AI2 Reasoning Challenge (25-Shot) 66.81
HellaSwag (10-Shot) 85.79
MMLU (5-Shot) 75.44
TruthfulQA (0-shot) 57.91
Winogrande (5-shot) 80.35
GSM8k (5-shot) 59.97
Downloads last month
1,211
Safetensors
Model size
34.4B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results