soujanyaporia
commited on
Commit
•
833aeeb
1
Parent(s):
09e0324
Update README.md
Browse files
README.md
CHANGED
@@ -4,9 +4,9 @@ datasets:
|
|
4 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
5 |
- declare-lab/HarmfulQA
|
6 |
---
|
7 |
-
[**Paper**](https://
|
8 |
|
9 |
-
As a part of our research efforts to make LLMs safer, we created **Starling**. It is obtained by fine-tuning Vicuna-7B on [**HarmfulQA**](https://huggingface.co/datasets/declare-lab/HarmfulQA), a ChatGPT-distilled dataset that we collected using the Chain of Utterances (CoU) prompt. More details are in our paper [**Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment**](https://
|
10 |
|
11 |
<img src="https://declare-lab.net/assets/images/logos/starling-final.png" alt="Image" width="100" height="100">
|
12 |
|
@@ -36,7 +36,7 @@ This jailbreak prompt (termed as Chain of Utterances (CoU) prompt in the paper)
|
|
36 |
|
37 |
<h2>HarmfulQA Data Collection</h2>
|
38 |
|
39 |
-
We also release our **HarmfulQA** dataset with 1,960 harmful questions (converting 10 topics-10 subtopics) for red-teaming as well as conversations based on them used in model safety alignment, more details [**here**](https://huggingface.co/datasets/declare-lab/HarmfulQA).
|
40 |
|
41 |
<img src="https://declare-lab.net/assets/images/logos/data_gen.png" alt="Image" width="1000" height="1000">
|
42 |
|
|
|
4 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
5 |
- declare-lab/HarmfulQA
|
6 |
---
|
7 |
+
[**Paper**](https://arxiv.org/abs/2308.09662) | [**Github**](https://github.com/declare-lab/red-instruct) | [**Dataset**](https://huggingface.co/datasets/declare-lab/HarmfulQA)| [**Model**](https://huggingface.co/declare-lab/starling-7B)
|
8 |
|
9 |
+
As a part of our research efforts to make LLMs safer, we created **Starling**. It is obtained by fine-tuning Vicuna-7B on [**HarmfulQA**](https://huggingface.co/datasets/declare-lab/HarmfulQA), a ChatGPT-distilled dataset that we collected using the Chain of Utterances (CoU) prompt. More details are in our paper [**Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment**](https://arxiv.org/abs/2308.09662)
|
10 |
|
11 |
<img src="https://declare-lab.net/assets/images/logos/starling-final.png" alt="Image" width="100" height="100">
|
12 |
|
|
|
36 |
|
37 |
<h2>HarmfulQA Data Collection</h2>
|
38 |
|
39 |
+
We also release our **HarmfulQA** dataset with 1,960 harmful questions (converting 10 topics-10 subtopics) for red-teaming as well as conversations based on them used in model safety alignment, more details [**here**](https://huggingface.co/datasets/declare-lab/HarmfulQA). The following figure describes the data collection process.
|
40 |
|
41 |
<img src="https://declare-lab.net/assets/images/logos/data_gen.png" alt="Image" width="1000" height="1000">
|
42 |
|