sarahyurick commited on
Commit
d8ac31c
1 Parent(s): 76375f3

Add "How to Use in Transformers" section (#1)

Browse files

- Add "How to Use in Transformers" section (7556b3864445989b37a9fa09518ec1ae9d9ab105)
- Add PyTorchModelHubMixin (68fd8f2ce00e6898d27a903b5b02cb6419a65bec)

Files changed (1) hide show
  1. README.md +89 -0
README.md CHANGED
@@ -75,6 +75,95 @@ Success is defined as having an acceptable catch rate (recall scores for each at
75
  The inference code is available on [NeMo Curator's GitHub repository](https://github.com/NVIDIA/NeMo-Curator). <br>
76
  Check out [this example notebook](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/distributed_data_classification) to get started.
77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ## Ethical Considerations:
79
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
80
 
 
75
  The inference code is available on [NeMo Curator's GitHub repository](https://github.com/NVIDIA/NeMo-Curator). <br>
76
  Check out [this example notebook](https://github.com/NVIDIA/NeMo-Curator/tree/main/tutorials/distributed_data_classification) to get started.
77
 
78
+ ## How to Use in Transformers:
79
+ To use this AEGIS classifiers, you must get access to Llama Guard on Hugging Face here: https://huggingface.co/meta-llama/LlamaGuard-7b. Afterwards, you should set up a [user access token](https://huggingface.co/docs/hub/en/security-tokens) and pass that token into the constructor of this classifier.
80
+
81
+ ```python
82
+ import torch
83
+ import torch.nn.functional as F
84
+ from huggingface_hub import PyTorchModelHubMixin
85
+ from peft import PeftModel
86
+ from torch.nn import Dropout, Linear
87
+ from transformers import AutoModelForCausalLM, AutoTokenizer
88
+
89
+ # Initialize model embedded with AEGIS
90
+ pretrained_model_name_or_path = "meta-llama/LlamaGuard-7b"
91
+ dtype = torch.bfloat16
92
+ token = "hf_1234" # Replace with your user access token
93
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
94
+ base_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path, torch_dtype=dtype, token=token).to(device)
95
+ peft_model_name_or_path = "nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0"
96
+ model = PeftModel.from_pretrained(base_model, peft_model_name_or_path)
97
+
98
+ # Initialize tokenizer
99
+ tokenizer = AutoTokenizer.from_pretrained(
100
+ pretrained_model_name_or_path=pretrained_model_name_or_path,
101
+ padding_side="left"
102
+ )
103
+ tokenizer.pad_token = tokenizer.unk_token
104
+
105
+ class InstructionDataGuardNet(torch.nn.Module, PyTorchModelHubMixin):
106
+ def __init__(self, input_dim=4096, dropout=0.7):
107
+ super().__init__()
108
+ self.input_dim = input_dim
109
+ self.dropout = Dropout(dropout)
110
+ self.sigmoid = torch.nn.Sigmoid()
111
+ self.input_layer = Linear(input_dim, input_dim)
112
+
113
+ self.hidden_layer_0 = Linear(input_dim, 2000)
114
+ self.hidden_layer_1 = Linear(2000, 500)
115
+ self.hidden_layer_2 = Linear(500, 1)
116
+
117
+ def forward(self, x):
118
+ x = torch.nn.functional.normalize(x, dim=-1)
119
+ x = self.dropout(x)
120
+ x = F.relu(self.input_layer(x))
121
+ x = self.dropout(x)
122
+ x = F.relu(self.hidden_layer_0(x))
123
+ x = self.dropout(x)
124
+ x = F.relu(self.hidden_layer_1(x))
125
+ x = self.dropout(x)
126
+ x = self.hidden_layer_2(x)
127
+ x = self.sigmoid(x)
128
+ return x
129
+
130
+ # Load Instruction-Data-Guard classifier
131
+ instruction_data_guard = InstructionDataGuardNet.from_pretrained("nvidia/instruction-data-guard")
132
+ instruction_data_guard = instruction_data_guard.to(device)
133
+ instruction_data_guard = instruction_data_guard.eval()
134
+
135
+ # Function to compute results
136
+ def get_instruction_data_guard_results(
137
+ prompts,
138
+ tokenizer,
139
+ model,
140
+ instruction_data_guard,
141
+ device="cuda",
142
+ ):
143
+ input_ids = tokenizer(prompts, padding=True, return_tensors="pt").to(device)
144
+ outputs = model.generate(
145
+ **input_ids,
146
+ output_hidden_states=True,
147
+ return_dict_in_generate=True,
148
+ max_new_tokens=1,
149
+ pad_token_id=0,
150
+ )
151
+ input_tensor = outputs.hidden_states[0][32][:, -1,:].to(torch.float)
152
+ return instruction_data_guard(input_tensor).flatten().detach().cpu().numpy()
153
+
154
+ # Prepare sample input
155
+ instruction = "Find a route between San Diego and Phoenix which passes through Nevada"
156
+ input_ = ""
157
+ response = "Drive to Las Vegas with highway 15 and from there drive to Phoenix with highway 93"
158
+ benign_sample = f"Instruction: {instruction}. Input: {input_}. Response: {response}."
159
+ text_samples = [benign_sample]
160
+ poisoning_scores = get_instruction_data_guard_results(
161
+ text_samples, tokenizer, model, instruction_data_guard
162
+ )
163
+ print(poisoning_scores)
164
+ # [0.01149639]
165
+ ```
166
+
167
  ## Ethical Considerations:
168
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
169