Hi, why i get bad results with your model ?

by novak2 - opened May 8, 2024

Discussion

novak2

May 8, 2024

asofter

Protect AI org May 9, 2024

Hey @Rusvo , can you please share more context on your test?

There are limitations to this model, such as it doesn't detect jailbreaks well, only English prompts and it's not recommended to scan system prompts using it (Act as a chatbot...).

novak2

May 9, 2024

•

edited May 9, 2024

i was try to use this model with 14 subnet of bittensor, i was get to many false_negative

Hey @Rusvo , can you please share more context on your test?

There are limitations to this model, such as it doesn't detect jailbreaks well, only English prompts and it's not recommended to scan system prompts using it (Act as a chatbot...).

asofter

Protect AI org May 10, 2024

Interesting. Do you have visibility on the dataset?

At least, one of their datasets is https://huggingface.co/datasets/synapsecai/synthetic-prompt-injections, which they probably they use for that analysis.

I ran tests on the model for this dataset:

v1 model:

Accuracy: 0.5436168810154499
Precision: 0.5232419082282594
Recall: 0.9664005761959066
F1 Score: 0.6789028735268694

v2 model:

Accuracy: 0.6159401181814913
Precision: 0.9556229850180163
Recall: 0.24195426444991297
F1 Score: 0.3861413642154468

The results are quite interesting, so I will spend more time understanding them.

novak2

May 10, 2024

Interesting. Do you have visibility on the dataset?

At least, one of their datasets is https://huggingface.co/datasets/synapsecai/synthetic-prompt-injections, which they probably they use for that analysis.

I ran tests on the model for this dataset:

v1 model:

Accuracy: 0.5436168810154499

Precision: 0.5232419082282594

Recall: 0.9664005761959066

F1 Score: 0.6789028735268694

v2 model:

Accuracy: 0.6159401181814913

Precision: 0.9556229850180163

Recall: 0.24195426444991297

F1 Score: 0.3861413642154468

The results are quite interesting, so I will spend more time understanding them.

how do you run these tests

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment