gregH commited on
Commit
09bf795
1 Parent(s): 747e566

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +2 -2
index.html CHANGED
@@ -132,8 +132,8 @@ Exploring Refusal Loss Landscapes </title>
132
  <h2 id="demonstration">Demonstration</h2>
133
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
134
  different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
135
- We report the average refusal rate across these 6 malicious user query datasets as True Positive Rate~(TPR) and the refusal rate
136
- on benign user queries as False Positive Rate~(FPR).
137
  </p>
138
 
139
 
 
132
  <h2 id="demonstration">Demonstration</h2>
133
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder) against 6
134
  different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1.5).
135
+ We demonstrate the average refusal rate across these 6 malicious user query datasets and the refusal rate
136
+ on benign user queries as the Benign Refusal Rate.
137
  </p>
138
 
139