Model Card for Model ID
Same as v3 but with an intervention against the refusal direction applied in layer 19.
Should notably reduce censorship and refusals, given a system prompt configured in that direction.
No additional fine-tuning has been done in this branch but EQ-bench detected damage appears fairly minimal.
(Trying to tune this without hurting what arcee-ai did in distillation for the lovely SuperNova-Medius is an interesting challenge, and one I'm still trying to figure out. A little DPO isn't usually too harsh on it, but even qLoRA has proven liable to degrade capability. Seems like good teachers make a difference, who knew?)
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
eq_bench | 2.1 | none | 0 | eqbench | ↑ | 78.4158 | ± | 1.609 |
none | 0 | percent_parseable | ↑ | 100.0000 | ± | 0.000 |
- Downloads last month
- 55
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.