gregH commited on
Commit
273a1f2
·
verified ·
1 Parent(s): 7310da4

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +1 -1
index.html CHANGED
@@ -158,7 +158,7 @@ We provide more details about the running flow of Gradient Cuff in the paper.
158
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
159
  different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
160
  We demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal Rate and the refusal rate
161
- on benign user queries as the Benign Refusal Rate.
162
  </p>
163
 
164
 
 
158
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
159
  different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
160
  We demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal Rate and the refusal rate
161
+ on benign user queries as the Benign Refusal Rate. The detection performance against individual jailbreak attacks is shown in the below bar chart.
162
  </p>
163
 
164