Report for distilbert/distilbert-base-uncased-finetuned-sst-2-english

#163
by giskard-bot - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 5 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Robustness issues (1)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.0% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.130 Add typos 104/800 tested samples (13.0%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
13 we root for ( clara and paul ) , even like them , though perhaps it 's an emotion closer to pity . we root for ( clara and paul ) , even like them , htough perhaps it 's an emotiom closer to pity . POSITIVE (p = 0.96) NEGATIVE (p = 0.99)
16 the emotions are raw and will strike a nerve with anyone who 's ever had family trauma . the ekotions are raw andw ill strike a nerve with anyone wgo 's ever had family trauma . POSITIVE (p = 1.00) NEGATIVE (p = 0.60)
22 holden caulfield did it better . holdsn caulfkeld did t better . POSITIVE (p = 0.99) NEGATIVE (p = 1.00)
👉Performance issues (4)

For records in the dataset where text_length(text) >= 50.500 AND text_length(text) < 61.500, the Precision is 15.5% lower than the global Precision.

Level Data slice Metric Deviation
major 🔴 text_length(text) >= 50.500 AND text_length(text) < 61.500 Precision = 0.759 -15.50% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text text_length(text) label Predicted label
92 you wo n't like roger , but you will quickly recognize him . 61 NEGATIVE POSITIVE (p = 1.00)
171 rarely has leukemia looked so shimmering and benign . 54 NEGATIVE POSITIVE (p = 0.98)
183 the lower your expectations , the more you 'll enjoy it . 58 NEGATIVE POSITIVE (p = 1.00)

For records in the dataset where text_length(text) >= 73.500 AND text_length(text) < 82.500, the Recall is 11.19% lower than the global Recall.

Level Data slice Metric Deviation
major 🔴 text_length(text) >= 73.500 AND text_length(text) < 82.500 Recall = 0.826 -11.19% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text text_length(text) label Predicted label
93 if steven soderbergh 's ` solaris ' is a failure it is a glorious failure . 76 POSITIVE NEGATIVE (p = 1.00)
123 turns potentially forgettable formula into something strangely diverting . 75 POSITIVE NEGATIVE (p = 0.99)
142 what better message than ` love thyself ' could young women of any size receive ? 82 POSITIVE NEGATIVE (p = 0.99)

For records in the dataset where text_length(text) >= 165.500 AND text_length(text) < 179.500, the Recall is 6.37% lower than the global Recall.

Level Data slice Metric Deviation
medium 🟡 text_length(text) >= 165.500 AND text_length(text) < 179.500 Recall = 0.871 -6.37% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text text_length(text) label Predicted label
158 by getting myself wrapped up in the visuals and eccentricities of many of the characters , i found myself confused when it came time to get to the heart of the movie . 168 NEGATIVE POSITIVE (p = 0.99)
266 a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors . 179 POSITIVE NEGATIVE (p = 0.99)
282 while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer 166 POSITIVE NEGATIVE (p = 1.00)

For records in the dataset where text_length(text) >= 151.500 AND text_length(text) < 165.500, the Recall is 5.93% lower than the global Recall.

Level Data slice Metric Deviation
medium 🟡 text_length(text) >= 151.500 AND text_length(text) < 165.500 Recall = 0.875 -5.93% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text text_length(text) label Predicted label
324 you 'll gasp appalled and laugh outraged and possibly , watching the spectacle of a promising young lad treading desperately in a nasty sea , shed an errant tear . 164 POSITIVE NEGATIVE (p = 0.95)
673 drops you into a dizzying , volatile , pressure-cooker of a situation that quickly snowballs out of control , while focusing on the what much more than the why . 162 POSITIVE NEGATIVE (p = 0.94)
692 sustains its dreamlike glide through a succession of cheesy coincidences and voluptuous cheap effects , not the least of which is rebecca romijn-stamos . 154 NEGATIVE POSITIVE (p = 0.94)

Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment