Update README.md
Browse files
README.md
CHANGED
@@ -40,4 +40,17 @@ We also release our **HarmfulQA** dataset with 1,960 harmful questions (converti
|
|
40 |
|
41 |
<img src="https://declare-lab.net/assets/images/logos/data_gen.png" alt="Image" width="1000" height="1000">
|
42 |
|
43 |
-
_Note: This model is referred to as Starling (Blue) in the paper. We shall soon release Starling (Blue-Red) which was trained on harmful data using an objective function that helps the model learn from the red (harmful) response data._
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
<img src="https://declare-lab.net/assets/images/logos/data_gen.png" alt="Image" width="1000" height="1000">
|
42 |
|
43 |
+
_Note: This model is referred to as Starling (Blue) in the paper. We shall soon release Starling (Blue-Red) which was trained on harmful data using an objective function that helps the model learn from the red (harmful) response data._
|
44 |
+
|
45 |
+
## Citation
|
46 |
+
|
47 |
+
```bibtex
|
48 |
+
@misc{bhardwaj2023redteaming,
|
49 |
+
title={Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment},
|
50 |
+
author={Rishabh Bhardwaj and Soujanya Poria},
|
51 |
+
year={2023},
|
52 |
+
eprint={2308.09662},
|
53 |
+
archivePrefix={arXiv},
|
54 |
+
primaryClass={cs.CL}
|
55 |
+
}
|
56 |
+
```
|