Badllama 3
Collection
2 items
•
Updated
This repo holds weights for Palisade Research's showcase of how open-weight model guardrails can be stripped off in minutes of GPU time. See the Badllama 3 paper for additional background.
Email the authors to request research access. We do not review access requests made on HuggingFace.
main
mirrors ro_authors_awq
and holds our best-performing model built with refusal orthogonalizationqlora
holds the QLoRA-tuned modelreft
holds the ReFT-tuned modelBase model
meta-llama/Meta-Llama-3-8B-Instruct