gregH commited on
Commit
2ed20ca
1 Parent(s): 4555af9

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +3 -2
index.html CHANGED
@@ -77,11 +77,12 @@ Exploring Refusal Loss Landscapes </title>
77
  </div>
78
  </div>
79
 
80
- <h3 id="refusal-loss">Refusal Loss</h3>
81
  <p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
82
  autoregressive sampling-based generation. With this randomness, it is an
83
  interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
84
- sometimes be able to bypass the safety guardrail. Based on this observation, we propose a new concept called Refusal Loss and visualize its 2-d
 
85
  landscape below:
86
  </p>
87
 
 
77
  </div>
78
  </div>
79
 
80
+ <h3 id="refusal-loss">Refusal Loss Landscape Exploration</h3>
81
  <p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
82
  autoregressive sampling-based generation. With this randomness, it is an
83
  interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
84
+ sometimes be able to bypass the safety guardrail. Based on this observation, we propose a new concept called Refusal Loss to represent the probability with which
85
+ the LLM won't reject the input user query and visualize its 2-d
86
  landscape below:
87
  </p>
88