A much further trained version, this time done with full finetuning instead of DoRA. Similar ~50/50 mix of completion and instruct data.

Note: This likely has refusals like PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.1-SFT-3B since no focus was put on removing refusals. I'm working on a KTO DoRA to solve this, and possibly improve roleplay performance.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	21.50
IFEval (0-Shot)	62.92
BBH (3-Shot)	23.34
MATH Lvl 5 (4-Shot)	11.33
GPQA (0-shot)	3.02
MuSR (0-shot)	4.87
MMLU-PRO (5-shot)	23.50

Model tree for PJMixers-Dev/LLaMa-3.2-Instruct-JankMix-v0.2-SFT-3B

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

62.920
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

23.340
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

11.330
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

3.020
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

4.870
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

23.500

View on Papers With Code