Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,9 @@ datasets:
|
|
6 |
tags:
|
7 |
- not-for-all-audiences
|
8 |
---
|
9 |
-
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
6 |
tags:
|
7 |
- not-for-all-audiences
|
8 |
---
|
9 |
+
Trained on [NobodyExistsOnTheInternet/ToxicQAFinal](https://huggingface.co/datasets/NobodyExistsOnTheInternet/ToxicQAFinal). I converted the set to a preference dataset using refusals generated from LLaMa-3-Instruct-8B.
|
10 |
|
11 |
+
![train/rewards](https://huggingface.co/PJMixers/LLaMa-3-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/rewards.png)
|
12 |
+
![train/logits](https://huggingface.co/PJMixers/LLaMa-3-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/logits.png)
|
13 |
+
![train/logps](https://huggingface.co/PJMixers/LLaMa-3-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/logps.png)
|
14 |
+
![train](https://huggingface.co/PJMixers/LLaMa-3-Instruct-ToxicQAFinal-ORPO-8B-QDoRA/resolve/main/images/train.png)
|