Foreshhh commited on
Commit
475b9b9
1 Parent(s): b337fc0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -35,6 +35,25 @@ MD-Judge was born to study the safety of different LLMs serving as an general ev
35
  - **Repository:** [SALAD-Bench Github](https://github.com/OpenSafetyLab/SALAD-BENCH)
36
  - **Paper:** [SALAD-BENCH](https://arxiv.org/abs/2402.02416)
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ## Uses
39
  ```python
40
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
35
  - **Repository:** [SALAD-Bench Github](https://github.com/OpenSafetyLab/SALAD-BENCH)
36
  - **Paper:** [SALAD-BENCH](https://arxiv.org/abs/2402.02416)
37
 
38
+ ## Model Performance
39
+
40
+ Compare our MD-Judge model with other methods on different public safety testsets using QA format. All the model-based methods are evaluated using the same safety proxy template.
41
+ - Keyword
42
+ - GPT-3.5: https://platform.openai.com/docs/models/gpt-3-5-turbo
43
+ - GPT-4: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
44
+ - LlamaGuard: https://huggingface.co/meta-llama/LlamaGuard-7b
45
+
46
+ | **Methods** | **Base** | **Enhance** | **ToxicChat** | **Beaver** | **SafeRLHF** |
47
+ |-------------|----------|-------------|--------|------------|--------------|
48
+ | Keyword | 0.058 | 0.261 | 0.193 | 0.012 | 0.015 |
49
+ | LlamaGuard | 0.585 | 0.085 | 0.220 | 0.653 | 0.693 |
50
+ | GPT-3.5 | 0.374 | 0.731 | 0.499 | 0.800 | 0.771 |
51
+ | GPT-4 | 0.785 | 0.827 | 0.470 | 0.842 | 0.835 |
52
+ | MD-Judge | **0.818**| **0.873** | **0.644** | **0.866** | **0.864** |
53
+
54
+ > Comparison of F1 scores between our model and other leading methods. Best results are **bolded** and second best are *underlined*. Base and Enhance indicate our SALAD-Base-Test and SALAD-Enhance-Test, TC means ToxicChat, and Beaver means Beavertails.
55
+
56
+
57
  ## Uses
58
  ```python
59
  from transformers import AutoTokenizer, AutoModelForCausalLM