Update README.md
Browse files
README.md
CHANGED
@@ -35,6 +35,25 @@ MD-Judge was born to study the safety of different LLMs serving as an general ev
|
|
35 |
- **Repository:** [SALAD-Bench Github](https://github.com/OpenSafetyLab/SALAD-BENCH)
|
36 |
- **Paper:** [SALAD-BENCH](https://arxiv.org/abs/2402.02416)
|
37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
## Uses
|
39 |
```python
|
40 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
35 |
- **Repository:** [SALAD-Bench Github](https://github.com/OpenSafetyLab/SALAD-BENCH)
|
36 |
- **Paper:** [SALAD-BENCH](https://arxiv.org/abs/2402.02416)
|
37 |
|
38 |
+
## Model Performance
|
39 |
+
|
40 |
+
Compare our MD-Judge model with other methods on different public safety testsets using QA format. All the model-based methods are evaluated using the same safety proxy template.
|
41 |
+
- Keyword
|
42 |
+
- GPT-3.5: https://platform.openai.com/docs/models/gpt-3-5-turbo
|
43 |
+
- GPT-4: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
|
44 |
+
- LlamaGuard: https://huggingface.co/meta-llama/LlamaGuard-7b
|
45 |
+
|
46 |
+
| **Methods** | **Base** | **Enhance** | **ToxicChat** | **Beaver** | **SafeRLHF** |
|
47 |
+
|-------------|----------|-------------|--------|------------|--------------|
|
48 |
+
| Keyword | 0.058 | 0.261 | 0.193 | 0.012 | 0.015 |
|
49 |
+
| LlamaGuard | 0.585 | 0.085 | 0.220 | 0.653 | 0.693 |
|
50 |
+
| GPT-3.5 | 0.374 | 0.731 | 0.499 | 0.800 | 0.771 |
|
51 |
+
| GPT-4 | 0.785 | 0.827 | 0.470 | 0.842 | 0.835 |
|
52 |
+
| MD-Judge | **0.818**| **0.873** | **0.644** | **0.866** | **0.864** |
|
53 |
+
|
54 |
+
> Comparison of F1 scores between our model and other leading methods. Best results are **bolded** and second best are *underlined*. Base and Enhance indicate our SALAD-Base-Test and SALAD-Enhance-Test, TC means ToxicChat, and Beaver means Beavertails.
|
55 |
+
|
56 |
+
|
57 |
## Uses
|
58 |
```python
|
59 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|