Report for cardiffnlp/twitter-roberta-base-irony
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset irony
, split validation
).
👉Overconfidence issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Overconfidence | medium 🟡 | text_length(text) < 87.500 |
Overconfidence rate = 0.552 | — | +12.47% than global |
🔍✨Examples
For records in the dataset where `text_length(text)` < 87.500, we found a significantly higher number of overconfident wrong predictions (64 samples, corresponding to 55.172413793103445% of the wrong predictions in the data slice).text | text_length(text) | label | Predicted label |
|
---|---|---|---|---|
470 | Today has been a blast | 22 | non_irony | irony (p = 0.98) |
non_irony (p = 0.02) | ||||
771 | My dad's such a big kid on Christmas morning waking everyone up so bloody early | 79 | non_irony | irony (p = 0.97) |
non_irony (p = 0.03) | ||||
902 | When one ear breaks on your headphones it's so frustrating! #today | 67 | non_irony | irony (p = 0.97) |
non_irony (p = 0.03) |
👉Ethical issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | medium 🟡 | — | Fail rate = 0.061 | Switch countries from high- to low-income and vice versa | 2/33 tested samples (6.06%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 6.06% of the cases. We expected the predictions not to be affected by this transformation.text | Switch countries from high- to low-income and vice versa(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
548 | @user A British world champion in one of the most demanding & popular sports on earth. Yeah, of course I'm being sarcastic. | @user A Kiribati world champion in one of the most demanding & popular sports on earth. Yeah, of course I'm being sarcastic. | irony (p = 0.57) | non_irony (p = 0.53) |
686 | AAP said will declare AK candidate in last list but declared it before.This issue affecting India's GDP is termed as U-Turn by BJP #AK4Delhi | AAP said will declare AK candidate in last list but declared it before.This issue affecting British Virgin Islands's GDP is termed as U-Turn by BJP #AK4Delhi | irony (p = 0.50) | non_irony (p = 0.53) |
👉Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.201 | Transform to uppercase | 192/953 tested samples (20.15%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 20.15% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | !!! RT @USER OF ALL THE PLACES TO GET STUCK IN A TRAFFIC JAM | irony (p = 0.51) | non_irony (p = 0.78) |
13 | Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. | WORKAHOLICS: IF YOU'RE SICK, DON'T LET THAT STOP YOU FROM BRINGING YOUR GERMS INTO THE OFFICE. WE ALL APPRECIATE YOUR COMMITMENT. | irony (p = 0.90) | non_irony (p = 0.89) |
19 | Flight diverted over boiling water incident | FLIGHT DIVERTED OVER BOILING WATER INCIDENT | irony (p = 0.70) | non_irony (p = 0.88) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.155 | Transform to title case | 148/953 tested samples (15.53%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 15.53% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | !!! Rt @User Of All The Places To Get Stuck In A Traffic Jam | irony (p = 0.51) | non_irony (p = 0.80) |
13 | Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. | Workaholics: If You'Re Sick, Don'T Let That Stop You From Bringing Your Germs Into The Office. We All Appreciate Your Commitment. | irony (p = 0.90) | non_irony (p = 0.82) |
19 | Flight diverted over boiling water incident | Flight Diverted Over Boiling Water Incident | irony (p = 0.70) | non_irony (p = 0.79) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.119 | Add typos | 102/859 tested samples (11.87%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.87% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
13 | Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. | Woroaholiwsc: if you're sick, don't let that stop you from bringing your germs ingto the office. We all appreciate your commitment. | irony (p = 0.90) | non_irony (p = 0.54) |
19 | Flight diverted over boiling water incident | Flight diverted over boiling water incidwent | irony (p = 0.70) | non_irony (p = 0.61) |
23 | When you have Challah French Toast on Christmas | When you havde Challah French Toxst on Christmas | irony (p = 0.84) | non_irony (p = 0.86) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.088 | Punctuation Removal | 68/773 tested samples (8.8%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.8% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | RT @user Of all the places to get stuck in a traffic jam | irony (p = 0.51) | non_irony (p = 0.63) |
15 | What else would you do on friday? | #TGIF #8crap | What else would you do on friday | #TGIF #8crap |
55 | @user @user you can't reason with someone with a bio as moronic as his. "So should everyone else" #SoDemocratic | @user @user you can t reason with someone with a bio as moronic as his So should everyone else #SoDemocratic | irony (p = 0.59) | non_irony (p = 0.53) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.052 | Transform to lowercase | 44/852 tested samples (5.16%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.16% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | !!! rt @user of all the places to get stuck in a traffic jam | irony (p = 0.51) | non_irony (p = 0.62) |
29 | @user Frisky at 2am? That's nothing new. | @user frisky at 2am? that's nothing new. | non_irony (p = 0.57) | irony (p = 0.65) |
74 | Honking at me whilst you drive past - so romantic, it makes me want to trace you through your number plate and be with you forever | honking at me whilst you drive past - so romantic, it makes me want to trace you through your number plate and be with you forever | non_irony (p = 0.53) | irony (p = 0.51) |
👉Performance issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "user" |
Recall = 0.556 | — | -22.76% than global |
🔍✨Examples
For records in the dataset where `text` contains "user", the Recall is 22.76% lower than the global Recall.text | label | Predicted label |
|
---|---|---|---|
35 | @user hahaha such a 1% town | non_irony | irony (p = 0.58) |
53 | @user Just abt 2 say d same :) I'm not sure whether Oxford Brookes Uni is part of Oxford Uni. yet his CV is impressive still! | irony | non_irony (p = 0.83) |
64 | @user even your link to the service alert is down. | irony | non_irony (p = 0.65) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!