Update index.html
Browse files- index.html +5 -5
index.html
CHANGED
@@ -130,11 +130,11 @@ Exploring Refusal Loss Landscapes </title>
|
|
130 |
<div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
|
131 |
|
132 |
<h2 id="demonstration">Demonstration</h2>
|
133 |
-
<p>
|
134 |
-
|
135 |
-
|
136 |
-
|
137 |
-
|
138 |
|
139 |
<p>We hope this tool could also facilitate the development process.</p>
|
140 |
|
|
|
130 |
<div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
|
131 |
|
132 |
<h2 id="demonstration">Demonstration</h2>
|
133 |
+
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
|
134 |
+
different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
|
135 |
+
We report the average refusal rate across these 6 malicious user query datasets as True Positive Rate~(TPR) and the refusal rate
|
136 |
+
on benign user queries as False Positive Rate~(FPR).
|
137 |
+
</p>
|
138 |
|
139 |
<p>We hope this tool could also facilitate the development process.</p>
|
140 |
|