Models and dataset used in paper "The Jailbreak Tax: How Useful Are Your Jailbreak Outputs"

SPY Lab - ETH Zurich
AI & ML interests
Security, privacy, and trustworthiness of machine learning systems.
Recent Activity
View all activity
Organization Card
The Secure and Private AI (SPY) Lab conducts research on the security, privacy and trustworthiness of machine learning systems. We often approach these problems from an adversarial perspective, by designing attacks that probe the worst-case performance of a system to ultimately understand and improve its safety.
We are based at ETH Zurich. Learn more about our work in our website.
Collections
3
models
31

ethz-spylab/Llama-3.1-70B-Instruct_refuse_math
Updated

ethz-spylab/Llama-3.1-8B-Instruct_do_bio_again
Updated

ethz-spylab/Llama-3.1-8B-Instruct_refuse_bio
Updated

ethz-spylab/Llama-3.1-70B-Instruct_do_biology_again_5e-5
Updated

ethz-spylab/Llama-3.1-70B-Instruct_do_biology_5e-5
Updated

ethz-spylab/Llama-3.1-70B-Instruct_refuse_biology_5e-5
Updated

ethz-spylab/Llama-3.1-70B-Instruct_refuse_biology
Updated
•
1

ethz-spylab/Llama-3.1-70B-Instruct_do_math_chat
Updated

ethz-spylab/Llama-3.1-70B-Instruct_do_math_again
Updated

ethz-spylab/Llama-3.1-8B-Instruct_do_math_chat
Updated
datasets
13
ethz-spylab/EvilMath
Viewer
•
Updated
•
189
•
64
ethz-spylab/ctf-satml24
Viewer
•
Updated
•
137k
•
287
•
19
ethz-spylab/competition_eval_dataset
Viewer
•
Updated
•
2.31k
•
140
•
1
ethz-spylab/competition_trojan1
Viewer
•
Updated
•
42.5k
•
78
ethz-spylab/competition_trojan4
Viewer
•
Updated
•
42.5k
•
55
ethz-spylab/competition_trojan5
Viewer
•
Updated
•
42.5k
•
57
ethz-spylab/competition_trojan2
Viewer
•
Updated
•
42.5k
•
49
ethz-spylab/competition_trojan3
Viewer
•
Updated
•
42.5k
•
45
ethz-spylab/curated-harmless-dataset
Viewer
•
Updated
•
87
•
78
ethz-spylab/hh-harmless-train-with-rewards
Viewer
•
Updated
•
42.5k
•
117