justinphan3110 commited on
Commit
115b97e
1 Parent(s): 05ebeb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -83,4 +83,15 @@ outputs = cls.generate(inputs)
83
 
84
  *Table 1: Agreement rates between previous metrics and classifiers compared to human judgments on our manually labeled validation set. Our classifier, trained on distilled data from GPT-4-0613, achieves performance comparable to GPT-4.*
85
 
 
 
 
 
 
 
 
 
 
 
 
86
 
 
83
 
84
  *Table 1: Agreement rates between previous metrics and classifiers compared to human judgments on our manually labeled validation set. Our classifier, trained on distilled data from GPT-4-0613, achieves performance comparable to GPT-4.*
85
 
86
+ ### 📖 Citation:
87
+
88
+ ```
89
+ @article{harmbench,
90
+ title = {HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal},
91
+ author = {Mazeika, Mantas and Phan, Long and Yin, Xuwang and Zou, Andy and Wang, Zifan and Mu, Norman and Sakhaee, Elham and Li, Nathaniel and Basart, Steven and Li, Bo and Forsyth, David and Hendrycks, Dan},
92
+ journal = {arXiv preprint},
93
+ year = {2024}
94
+ }
95
+ ```
96
+
97