metadata

license: unknown

Overview

Our model, "ScamLLM" is designed to identify malicious prompts that can be used to generate phishing websites and emails using popular commercial LLMs like ChatGPT, Bard and Claude. This model is obtained by finetuning a Pre-Trained RoBERTa using a dataset encompassing multiple sets of malicious prompts.

Try out "ScamLLM" using the Inference API. Our model classifies prompts with "Label 1" to signify the identification of a phishing attempt, while "Label 0" denotes a prompt that is considered safe and non-malicious.

Dataset Details

The dataset utilized for training this model has been created using malicious prompts generated by GPT-4. Due to being active vulnerabilities under review, our dataset of malicious prompts is available only upon request at this stage, with plans for a public release scheduled for May 2024.

Training Details

The model was trained using RobertaForSequenceClassification.from_pretrained. In this process, both the model and tokenizer pertinent to the RoBERTa-base were employed and trained for 10 epochs (learning rate 2e-5 and AdamW Optimizer).

Inference

There are multiple ways to test this model, with the simplest being to use the Inference API, as well as with the pipeline "text-classification" as below:

from transformers import pipeline
classifier = pipeline(task="text-classification", model="phishbot/ScamLLM", top_k=None)
prompt = ["Your Sample Sentence or Prompt...."]
model_outputs = classifier(prompt)
print(model_outputs[0])