Upload 18 files

Browse files

Files changed (18) hide show

README.md +159 -1
config.json +36 -0
distilroberta-base-rejection-v1_emissions.csv +2 -0
emissions.csv +2 -0
merges.txt +0 -0
model.safetensors +3 -0
onnx/config.json +36 -0
onnx/merges.txt +0 -0
onnx/model.onnx +3 -0
onnx/special_tokens_map.json +51 -0
onnx/tokenizer.json +0 -0
onnx/tokenizer_config.json +64 -0
onnx/vocab.json +0 -0
special_tokens_map.json +15 -0
tokenizer.json +0 -0
tokenizer_config.json +57 -0
training_args.bin +3 -0
vocab.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,161 @@
 ---
-license: mit
 ---

 ---
+license: apache-2.0
+base_model: distilroberta-base
+tags:
+- generated_from_trainer
+- rejection
+- no_answer
+- chatgpt
+metrics:
+- accuracy
+- recall
+- precision
+- f1
+model-index:
+- name: distilroberta-base-rejection-v1
+  results: []
+language:
+- en
+pipeline_tag: text-classification
+co2_eq_emissions:
+  emissions: 0.07987621556153969
+  source: code carbon
+  training_type: fine-tuning
+datasets:
+- argilla/notus-uf-dpo-closest-rejected
 ---
+# Model Card for distilroberta-base-rejection-v1
+This model is a fine-tuned version of [distilroberta-base](https://huggingface.co/distilroberta-base) on multiple combined datasets of rejections from different LLMs and normal responses from RLHF datasets.
+It aims to identify rejections in LLMs when the prompt doesn't pass content moderation, classifying inputs into two categories: `0` for normal outputs and `1` for rejection detected.
+It achieves the following results on the evaluation set:
+- Loss: 0.0544
+- Accuracy: 0.9887
+- Recall: 0.9810
+- Precision: 0.9279
+- F1: 0.9537
+## Model details
+- **Fine-tuned by:** ProtectAI.com
+- **Model type:** distilroberta-base
+- **Language(s) (NLP):** English
+- **License:** Apache license 2.0
+- **Finetuned from model:** [distilroberta-base](https://huggingface.co/distilroberta-base)
+## Intended Uses & Limitations
+It aims to identify rejection, classifying inputs into two categories: `0` for normal output and `1` for rejection detected.
+The model's performance is dependent on the nature and quality of the training data. It might not perform well on text styles or topics not represented in the training set.
+Additionally, `distilroberta-base` is case-sensitive model.
+## How to Get Started with the Model
+### Transformers
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
+import torch
+tokenizer = AutoTokenizer.from_pretrained("ProtectAI/distilroberta-base-rejection-v1")
+model = AutoModelForSequenceClassification.from_pretrained("ProtectAI/distilroberta-base-rejection-v1")
+classifier = pipeline(
+  "text-classification",
+  model=model,
+  tokenizer=tokenizer,
+  truncation=True,
+  max_length=512,
+  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
+)
+print(classifier("Sorry, but I can't assist with that."))
+```
+### Optimum with ONNX
+Loading the model requires the [🤗 Optimum](https://huggingface.co/docs/optimum/index) library installed.
+```python
+from optimum.onnxruntime import ORTModelForSequenceClassification
+from transformers import AutoTokenizer, pipeline
+tokenizer = AutoTokenizer.from_pretrained("ProtectAI/distilroberta-base-rejection-v1", subfolder="onnx")
+model = ORTModelForSequenceClassification.from_pretrained("ProtectAI/distilroberta-base-rejection-v1", export=False, subfolder="onnx")
+classifier = pipeline(
+  task="text-classification",
+  model=model,
+  tokenizer=tokenizer,
+  truncation=True,
+  max_length=512,
+)
+print(classifier("Sorry, but I can't assist with that."))
+```
+### Use in LLM Guard
+[NoRefusal Scanner](https://llm-guard.com/output_scanners/no_refusal/) to detect if output was rejected, which can signal that something is going wrong with the prompt.
+## Training and evaluation data
+The model was trained on a custom dataset from multiple open-source ones. We used ~10% rejections and ~90% of normal outputs.
+We used the following papers when preparing the datasets:
+- [Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs](https://arxiv.org/abs/2308.13387)
+- [I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models](https://arxiv.org/abs/2306.03423)
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 16
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss | Accuracy | Recall | Precision | F1     |
+|:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:|:---------:|:------:|
+| 0.0525        | 1.0   | 3536  | 0.0355          | 0.9912   | 0.9583 | 0.9675    | 0.9629 |
+| 0.0219        | 2.0   | 7072  | 0.0312          | 0.9919   | 0.9917 | 0.9434    | 0.9669 |
+| 0.0121        | 3.0   | 10608 | 0.0350          | 0.9939   | 0.9905 | 0.9596    | 0.9748 |
+### Framework versions
+- Transformers 4.36.2
+- Pytorch 2.1.2+cu121
+- Datasets 2.16.1
+- Tokenizers 0.15.0
+## Community
+Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions,
+get help for package usage or contributions, or engage in discussions about LLM security!
+<a href="https://join.slack.com/t/laiyerai/shared_invite/zt-28jv3ci39-sVxXrLs3rQdaN3mIl9IT~w"><img src="https://github.com/laiyer-ai/llm-guard/blob/main/docs/assets/join-our-slack-community.png?raw=true" width="200"></a>
+## Citation
+```
+@misc{distilroberta-base-rejection-v1,
+  author = {ProtectAI.com},
+  title = {Fine-Tuned DistilRoberta-Base for Rejection in the output Detection},
+  year = {2024},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/ProtectAI/distilroberta-base-rejection-v1},
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "_name_or_path": "distilroberta-base",
+  "architectures": [
+    "RobertaForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "classifier_dropout": null,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "NORMAL",
+    "1": "REJECTION"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "NORMAL": 0,
+    "REJECTION": 1
+  },
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.36.2",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 50265
+}

distilroberta-base-rejection-v1_emissions.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2	+ 2024-01-20T10:29:33,distilroberta-base-rejection-v1_emissions,02c8b68f-b43f-4bb8-8de6-a9c3d5d44194,2502.685886859894,0.07995690783403821,3.1948439176424055e-05,42.5,115.8122138838321,5.787034034729004,0.0295453693439563,0.183039723098328,0.004021413543198922,0.21660650598548323,United States,USA,virginia,,,Linux-5.10.205-195.804.amzn2.x86_64-x86_64-with-glibc2.26,3.10.9,2.3.2,4,AMD EPYC 7R32,1,1 x NVIDIA A10G,-77.4903,39.0469,15.432090759277344,machine,N,1.0

emissions.csv ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ timestamp,project_name,run_id,duration,emissions,emissions_rate,cpu_power,gpu_power,ram_power,cpu_energy,gpu_energy,ram_energy,energy_consumed,country_name,country_iso_code,region,cloud_provider,cloud_region,os,python_version,codecarbon_version,cpu_count,cpu_model,gpu_count,gpu_model,longitude,latitude,ram_total_size,tracking_mode,on_cloud,pue
2	+ 2024-01-20T10:29:27,codecarbon,f49924b3-7f0c-4e83-bbc1-26f795913ce6,2495.7482607364655,0.07987621556153969,3.200491684925353e-05,42.5,173.661481972063,5.787034034729004,0.02946336953573758,0.18291424133127598,0.0040102964797854666,0.21638790734679889,United States,USA,virginia,,,Linux-5.10.205-195.804.amzn2.x86_64-x86_64-with-glibc2.26,3.10.9,2.3.2,4,AMD EPYC 7R32,1,1 x NVIDIA A10G,-77.4903,39.0469,15.432090759277344,machine,N,1.0

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e45fc9cdbde270a898f01b4fa01beaf06a0eb1f4c0f6ca88110af59759a0b26
+size 328492280

onnx/config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "_name_or_path": "asofter/distilroberta-base-rejection-v1",
+  "architectures": [
+    "RobertaForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "classifier_dropout": null,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "NORMAL",
+    "1": "REJECTION"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "NORMAL": 0,
+    "REJECTION": 1
+  },
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.35.2",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 50265
+}

onnx/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

onnx/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa6cd4d4d736b284931c9e71f6993a8c01c72b22f412e0754d6694a1c2d8b1cb
+size 328626516

onnx/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

onnx/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

onnx/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50264": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "mask_token": "<mask>",
+  "max_length": 512,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "</s>",
+  "stride": 0,
+  "tokenizer_class": "RobertaTokenizer",
+  "trim_offsets": true,
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "<unk>"
+}

onnx/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50264": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "RobertaTokenizer",
+  "trim_offsets": true,
+  "unk_token": "<unk>"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ef62f442414f3b3b31779f3d95e055f5ddf5734a391074a87ea42323ce67c246
+size 4664

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff