Update index.html
Browse files- index.html +4 -6
index.html
CHANGED
@@ -62,16 +62,14 @@ Exploring Refusal Loss Landscapes </title>
|
|
62 |
jailbreak attempts aiming at subverting the embedded safety guardrails. To address this challenge,
|
63 |
we define and investigate the \textbf{Refusal Loss} of LLMs and then propose a method called \textbf{Gradient Cuff} to
|
64 |
detect jailbreak attempts. In this demonstration, we first introduce the concept of "Jailbreak". Then we present the refusal loss
|
65 |
-
landscape and based on the characteristics of this landscape
|
66 |
methods and show the defense performance.
|
67 |
</p>
|
68 |
|
69 |
<h2 id="what-is-jailbreak">What is Jailbreak?</h2>
|
70 |
-
<p>
|
71 |
-
|
72 |
-
|
73 |
-
This phenomenon could hamper scenarios requiring accurate uncertainty estimation, such as safety-related tasks
|
74 |
-
(e.g., autonomous driving systems, medical diagnosis, etc.).</p>
|
75 |
|
76 |
<div class="container">
|
77 |
<div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">
|
|
|
62 |
jailbreak attempts aiming at subverting the embedded safety guardrails. To address this challenge,
|
63 |
we define and investigate the \textbf{Refusal Loss} of LLMs and then propose a method called \textbf{Gradient Cuff} to
|
64 |
detect jailbreak attempts. In this demonstration, we first introduce the concept of "Jailbreak". Then we present the refusal loss
|
65 |
+
landscape and propose the Gradient Cuff based on the characteristics of this landscape. Lastly, we compare Gradient Cuff with other jailbreak defense
|
66 |
methods and show the defense performance.
|
67 |
</p>
|
68 |
|
69 |
<h2 id="what-is-jailbreak">What is Jailbreak?</h2>
|
70 |
+
<p>Jailbreak attacks involve maliciously inserting or replacing tokens in the user instruction or rewriting it to bypass and circumvent
|
71 |
+
the safety guardrails of aligned LLMs. A notable example is that a jailbroken LLM would be tricked into
|
72 |
+
generating hate speech targeting certain groups of people, as demonstrated below.</p>
|
|
|
|
|
73 |
|
74 |
<div class="container">
|
75 |
<div id="jailbreak-intro" class="row align-items-center jailbreak-intro-sec">
|