Text Classification
Transformers
ONNX
Safetensors
English
deberta-v2
prompt-injection
injection
security
llm-security
Generated from Trainer
Inference Endpoints

Hi, why i get bad results with your model ?

#2
by Rusvo - opened

image.png

Protect AI org

Hey @Rusvo , can you please share more context on your test?

There are limitations to this model, such as it doesn't detect jailbreaks well, only English prompts and it's not recommended to scan system prompts using it (Act as a chatbot...).

i was try to use this model with 14 subnet of bittensor, i was get to many false_negative

Hey @Rusvo , can you please share more context on your test?

There are limitations to this model, such as it doesn't detect jailbreaks well, only English prompts and it's not recommended to scan system prompts using it (Act as a chatbot...).

Protect AI org

Interesting. Do you have visibility on the dataset?

At least, one of their datasets is https://huggingface.co/datasets/synapsecai/synthetic-prompt-injections, which they probably they use for that analysis.

I ran tests on the model for this dataset:

v1 model:

  • Accuracy: 0.5436168810154499
  • Precision: 0.5232419082282594
  • Recall: 0.9664005761959066
  • F1 Score: 0.6789028735268694

v2 model:

  • Accuracy: 0.6159401181814913
  • Precision: 0.9556229850180163
  • Recall: 0.24195426444991297
  • F1 Score: 0.3861413642154468

The results are quite interesting, so I will spend more time understanding them.

Interesting. Do you have visibility on the dataset?

At least, one of their datasets is https://huggingface.co/datasets/synapsecai/synthetic-prompt-injections, which they probably they use for that analysis.

I ran tests on the model for this dataset:

v1 model:

  • Accuracy: 0.5436168810154499
  • Precision: 0.5232419082282594
  • Recall: 0.9664005761959066
  • F1 Score: 0.6789028735268694

v2 model:

  • Accuracy: 0.6159401181814913
  • Precision: 0.9556229850180163
  • Recall: 0.24195426444991297
  • F1 Score: 0.3861413642154468

The results are quite interesting, so I will spend more time understanding them.

how do you run these tests

Sign up or log in to comment