Text Classification
Transformers
ONNX
Safetensors
English
roberta
Generated from Trainer
rejection
no_answer
chatgpt
Inference Endpoints
wu981526092 commited on
Commit
9833d6c
1 Parent(s): 4d49759

Upload 18 files

Browse files
README.md CHANGED
@@ -1,3 +1,161 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: distilroberta-base
4
+ tags:
5
+ - generated_from_trainer
6
+ - rejection
7
+ - no_answer
8
+ - chatgpt
9
+ metrics:
10
+ - accuracy
11
+ - recall
12
+ - precision
13
+ - f1
14
+ model-index:
15
+ - name: distilroberta-base-rejection-v1
16
+ results: []
17
+ language:
18
+ - en
19
+ pipeline_tag: text-classification
20
+ co2_eq_emissions:
21
+ emissions: 0.07987621556153969
22
+ source: code carbon
23
+ training_type: fine-tuning
24
+ datasets:
25
+ - argilla/notus-uf-dpo-closest-rejected
26
  ---
27
+
28
+ # Model Card for distilroberta-base-rejection-v1
29
+
30
+ This model is a fine-tuned version of [distilroberta-base](https://huggingface.co/distilroberta-base) on multiple combined datasets of rejections from different LLMs and normal responses from RLHF datasets.
31
+
32
+ It aims to identify rejections in LLMs when the prompt doesn't pass content moderation, classifying inputs into two categories: `0` for normal outputs and `1` for rejection detected.
33
+
34
+ It achieves the following results on the evaluation set:
35
+ - Loss: 0.0544
36
+ - Accuracy: 0.9887
37
+ - Recall: 0.9810
38
+ - Precision: 0.9279
39
+ - F1: 0.9537
40
+
41
+ ## Model details
42
+
43
+ - **Fine-tuned by:** ProtectAI.com
44
+ - **Model type:** distilroberta-base
45
+ - **Language(s) (NLP):** English
46
+ - **License:** Apache license 2.0
47
+ - **Finetuned from model:** [distilroberta-base](https://huggingface.co/distilroberta-base)
48
+
49
+ ## Intended Uses & Limitations
50
+
51
+ It aims to identify rejection, classifying inputs into two categories: `0` for normal output and `1` for rejection detected.
52
+
53
+ The model's performance is dependent on the nature and quality of the training data. It might not perform well on text styles or topics not represented in the training set.
54
+
55
+ Additionally, `distilroberta-base` is case-sensitive model.
56
+
57
+ ## How to Get Started with the Model
58
+
59
+ ### Transformers
60
+
61
+ ```python
62
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
63
+ import torch
64
+
65
+ tokenizer = AutoTokenizer.from_pretrained("ProtectAI/distilroberta-base-rejection-v1")
66
+ model = AutoModelForSequenceClassification.from_pretrained("ProtectAI/distilroberta-base-rejection-v1")
67
+
68
+ classifier = pipeline(
69
+ "text-classification",
70
+ model=model,
71
+ tokenizer=tokenizer,
72
+ truncation=True,
73
+ max_length=512,
74
+ device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
75
+ )
76
+
77
+ print(classifier("Sorry, but I can't assist with that."))
78
+ ```
79
+
80
+ ### Optimum with ONNX
81
+
82
+ Loading the model requires the [🤗 Optimum](https://huggingface.co/docs/optimum/index) library installed.
83
+
84
+ ```python
85
+ from optimum.onnxruntime import ORTModelForSequenceClassification
86
+ from transformers import AutoTokenizer, pipeline
87
+
88
+ tokenizer = AutoTokenizer.from_pretrained("ProtectAI/distilroberta-base-rejection-v1", subfolder="onnx")
89
+ model = ORTModelForSequenceClassification.from_pretrained("ProtectAI/distilroberta-base-rejection-v1", export=False, subfolder="onnx")
90
+
91
+ classifier = pipeline(
92
+ task="text-classification",
93
+ model=model,
94
+ tokenizer=tokenizer,
95
+ truncation=True,
96
+ max_length=512,
97
+ )
98
+
99
+ print(classifier("Sorry, but I can't assist with that."))
100
+ ```
101
+
102
+ ### Use in LLM Guard
103
+
104
+ [NoRefusal Scanner](https://llm-guard.com/output_scanners/no_refusal/) to detect if output was rejected, which can signal that something is going wrong with the prompt.
105
+
106
+ ## Training and evaluation data
107
+
108
+ The model was trained on a custom dataset from multiple open-source ones. We used ~10% rejections and ~90% of normal outputs.
109
+
110
+ We used the following papers when preparing the datasets:
111
+
112
+ - [Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs](https://arxiv.org/abs/2308.13387)
113
+ - [I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models](https://arxiv.org/abs/2306.03423)
114
+
115
+ ## Training procedure
116
+
117
+ ### Training hyperparameters
118
+
119
+ The following hyperparameters were used during training:
120
+ - learning_rate: 2e-05
121
+ - train_batch_size: 16
122
+ - eval_batch_size: 8
123
+ - seed: 42
124
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
125
+ - lr_scheduler_type: linear
126
+ - lr_scheduler_warmup_steps: 500
127
+ - num_epochs: 3
128
+
129
+ ### Training results
130
+
131
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy | Recall | Precision | F1 |
132
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:|:---------:|:------:|
133
+ | 0.0525 | 1.0 | 3536 | 0.0355 | 0.9912 | 0.9583 | 0.9675 | 0.9629 |
134
+ | 0.0219 | 2.0 | 7072 | 0.0312 | 0.9919 | 0.9917 | 0.9434 | 0.9669 |
135
+ | 0.0121 | 3.0 | 10608 | 0.0350 | 0.9939 | 0.9905 | 0.9596 | 0.9748 |
136
+
137
+ ### Framework versions
138
+
139
+ - Transformers 4.36.2
140
+ - Pytorch 2.1.2+cu121
141
+ - Datasets 2.16.1
142
+ - Tokenizers 0.15.0
143
+
144
+ ## Community
145
+
146
+ Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions,
147
+ get help for package usage or contributions, or engage in discussions about LLM security!
148
+
149
+ <a href="https://join.slack.com/t/laiyerai/shared_invite/zt-28jv3ci39-sVxXrLs3rQdaN3mIl9IT~w"><img src="https://github.com/laiyer-ai/llm-guard/blob/main/docs/assets/join-our-slack-community.png?raw=true" width="200"></a>
150
+
151
+ ## Citation
152
+
153
+ ```
154
+ @misc{distilroberta-base-rejection-v1,
155
+ author = {ProtectAI.com},
156
+ title = {Fine-Tuned DistilRoberta-Base for Rejection in the output Detection},
157
+ year = {2024},
158
+ publisher = {HuggingFace},
159
+ url = {https://huggingface.co/ProtectAI/distilroberta-base-rejection-v1},
160
+ }
161
+ ```
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distilroberta-base",
3
+ "architectures": [
4
+ "RobertaForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "NORMAL",
15
+ "1": "REJECTION"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3072,
19
+ "label2id": {
20
+ "NORMAL": 0,
21
+ "REJECTION": 1
22
+ },
23
+ "layer_norm_eps": 1e-05,
24
+ "max_position_embeddings": 514,
25
+ "model_type": "roberta",
26
+ "num_attention_heads": 12,
27
+ "num_hidden_layers": 6,
28
+ "pad_token_id": 1,
29
+ "position_embedding_type": "absolute",
30
+ "problem_type": "single_label_classification",
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.36.2",
33
+ "type_vocab_size": 1,
34
+ "use_cache": true,
35
+ "vocab_size": 50265
36
+ }
distilroberta-base-rejection-v1_emissions.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2
+ 2024-01-20T10:29:33,distilroberta-base-rejection-v1_emissions,02c8b68f-b43f-4bb8-8de6-a9c3d5d44194,2502.685886859894,0.07995690783403821,3.1948439176424055e-05,42.5,115.8122138838321,5.787034034729004,0.0295453693439563,0.183039723098328,0.004021413543198922,0.21660650598548323,United States,USA,virginia,,,Linux-5.10.205-195.804.amzn2.x86_64-x86_64-with-glibc2.26,3.10.9,2.3.2,4,AMD EPYC 7R32,1,1 x NVIDIA A10G,-77.4903,39.0469,15.432090759277344,machine,N,1.0
emissions.csv ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2
+ 2024-01-20T10:29:27,codecarbon,f49924b3-7f0c-4e83-bbc1-26f795913ce6,2495.7482607364655,0.07987621556153969,3.200491684925353e-05,42.5,173.661481972063,5.787034034729004,0.02946336953573758,0.18291424133127598,0.0040102964797854666,0.21638790734679889,United States,USA,virginia,,,Linux-5.10.205-195.804.amzn2.x86_64-x86_64-with-glibc2.26,3.10.9,2.3.2,4,AMD EPYC 7R32,1,1 x NVIDIA A10G,-77.4903,39.0469,15.432090759277344,machine,N,1.0
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e45fc9cdbde270a898f01b4fa01beaf06a0eb1f4c0f6ca88110af59759a0b26
3
+ size 328492280
onnx/config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "asofter/distilroberta-base-rejection-v1",
3
+ "architectures": [
4
+ "RobertaForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "id2label": {
14
+ "0": "NORMAL",
15
+ "1": "REJECTION"
16
+ },
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3072,
19
+ "label2id": {
20
+ "NORMAL": 0,
21
+ "REJECTION": 1
22
+ },
23
+ "layer_norm_eps": 1e-05,
24
+ "max_position_embeddings": 514,
25
+ "model_type": "roberta",
26
+ "num_attention_heads": 12,
27
+ "num_hidden_layers": 6,
28
+ "pad_token_id": 1,
29
+ "position_embedding_type": "absolute",
30
+ "problem_type": "single_label_classification",
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.35.2",
33
+ "type_vocab_size": 1,
34
+ "use_cache": true,
35
+ "vocab_size": 50265
36
+ }
onnx/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa6cd4d4d736b284931c9e71f6993a8c01c72b22f412e0754d6694a1c2d8b1cb
3
+ size 328626516
onnx/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
onnx/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
onnx/tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "max_length": 512,
52
+ "model_max_length": 512,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "<pad>",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "</s>",
58
+ "stride": 0,
59
+ "tokenizer_class": "RobertaTokenizer",
60
+ "trim_offsets": true,
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "<unk>"
64
+ }
onnx/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "sep_token": "</s>",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "unk_token": "<unk>"
57
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef62f442414f3b3b31779f3d95e055f5ddf5734a391074a87ea42323ce67c246
3
+ size 4664
vocab.json ADDED
The diff for this file is too large to render. See raw diff