SORRY-Bench (2025/03) Collection In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated 14 days ago
SORRY-Bench (2025/03) Collection In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated 14 days ago
SORRY-Bench (2025/03) Collection In this iteration, we removed the category "Impersonation" due to its ambiguous definition, and the fa most models more or less fulfill such requests. • 3 items • Updated 14 days ago
sorry-bench/ft-mistral-7b-instruct-v0.2-sorry-bench-202406 Text Generation • Updated Jul 2, 2024 • 289 • 4
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors Paper • 2406.14598 • Published Jun 20, 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Paper • 2402.05162 • Published Feb 7, 2024 • 1