Update index.html
Browse files- index.html +3 -0
index.html
CHANGED
@@ -150,6 +150,9 @@ Exploring Refusal Loss Landscapes </title>
|
|
150 |
<strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
|
151 |
</p>
|
152 |
|
|
|
|
|
|
|
153 |
|
154 |
<h2 id="demonstration">Demonstration</h2>
|
155 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
|
|
|
150 |
<strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
|
151 |
</p>
|
152 |
|
153 |
+
<p>
|
154 |
+
We provide more details about the running flow of Gradient Cuff in the paper.
|
155 |
+
</p>
|
156 |
|
157 |
<h2 id="demonstration">Demonstration</h2>
|
158 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
|