Commit
•
115b97e
1
Parent(s):
05ebeb0
Update README.md
Browse files
README.md
CHANGED
@@ -83,4 +83,15 @@ outputs = cls.generate(inputs)
|
|
83 |
|
84 |
*Table 1: Agreement rates between previous metrics and classifiers compared to human judgments on our manually labeled validation set. Our classifier, trained on distilled data from GPT-4-0613, achieves performance comparable to GPT-4.*
|
85 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
86 |
|
|
|
83 |
|
84 |
*Table 1: Agreement rates between previous metrics and classifiers compared to human judgments on our manually labeled validation set. Our classifier, trained on distilled data from GPT-4-0613, achieves performance comparable to GPT-4.*
|
85 |
|
86 |
+
### 📖 Citation:
|
87 |
+
|
88 |
+
```
|
89 |
+
@article{harmbench,
|
90 |
+
title = {HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal},
|
91 |
+
author = {Mazeika, Mantas and Phan, Long and Yin, Xuwang and Zou, Andy and Wang, Zifan and Mu, Norman and Sakhaee, Elham and Li, Nathaniel and Basart, Steven and Li, Bo and Forsyth, David and Hendrycks, Dan},
|
92 |
+
journal = {arXiv preprint},
|
93 |
+
year = {2024}
|
94 |
+
}
|
95 |
+
```
|
96 |
+
|
97 |
|