gregH commited on
Commit
101b0fa
·
verified ·
1 Parent(s): 7ee8287

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +3 -0
index.html CHANGED
@@ -150,6 +150,9 @@ Exploring Refusal Loss Landscapes </title>
150
  <strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
151
  </p>
152
 
 
 
 
153
 
154
  <h2 id="demonstration">Demonstration</h2>
155
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
 
150
  <strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
151
  </p>
152
 
153
+ <p>
154
+ We provide more details about the running flow of Gradient Cuff in the paper.
155
+ </p>
156
 
157
  <h2 id="demonstration">Demonstration</h2>
158
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6