gregH commited on
Commit
f134f1b
1 Parent(s): 273a1f2

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +5 -4
index.html CHANGED
@@ -155,10 +155,11 @@ We provide more details about the running flow of Gradient Cuff in the paper.
155
  </p>
156
 
157
  <h2 id="demonstration">Demonstration</h2>
158
- <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
159
- different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
160
- We demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal Rate and the refusal rate
161
- on benign user queries as the Benign Refusal Rate. The detection performance against individual jailbreak attacks is shown in the below bar chart.
 
162
  </p>
163
 
164
 
 
155
  </p>
156
 
157
  <h2 id="demonstration">Demonstration</h2>
158
+ <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
159
+ against 6 different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
160
+ Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
161
+ Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
162
+ shown in the provided bar chart.
163
  </p>
164
 
165