Spaces:
Running
Report for cardiffnlp/twitter-roberta-base-irony
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 11 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset irony
, split validation
).
👉Performance issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "love" |
Recall = 0.083 | — | -70.31% than global |
🔍✨Examples
For records in the dataset where `text` contains "love", the Recall is 70.31% lower than the global Recall.text | label | Predicted label |
|
---|---|---|---|
58 | Bae had an energy drink and wants to stay up... but I'm so sleeeeeepy. #love #sleep | irony | non_irony (p = 0.71) |
103 | How exciting he's walking all by himself #amazing #strength #hardwork #love | irony | non_irony (p = 0.57) |
120 | Dont we all just love those people who message you out of nowhere and act like you guys are close cus they want something from you? | non_irony | irony (p = 0.86) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "day" |
Recall = 0.120 | — | -57.25% than global |
🔍✨Examples
For records in the dataset where `text` contains "day", the Recall is 57.25% lower than the global Recall.text | label | Predicted label |
|
---|---|---|---|
6 | #40 #Corner #Cute #Day #Expensive #diy #crafts | Please RT: | irony |
10 | Last day in #Riga! #self #finnishgirl #businesswoman @ PK Riga Hotel | irony | non_irony (p = 0.73) |
41 | Oh, thank GOD - our entire office email system is down... the day of a big event. Santa, you know JUST what to get me for xmas. | non_irony | irony (p = 0.93) |
👉Overconfidence issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Overconfidence | medium 🟡 | text contains "love" |
Overconfidence rate = 0.898 | — | +14.24% than global |
🔍✨Examples
For records in the dataset where `text` contains "love", we found a significantly higher number of overconfident wrong predictions (44 samples, corresponding to 89.79591836734694% of the wrong predictions in the data slice).text | label | Predicted label |
|
---|---|---|---|
276 | Love being called into work on my morning off after not even 6 hours of sleep. #thanks #splitshift | non_irony | irony (p = 0.99) |
non_irony (p = 0.01) | |||
720 | Gotta love working the day after Christmas #smellya | non_irony | irony (p = 0.99) |
non_irony (p = 0.01) | |||
183 | Youve got to just love the efficiency @user two-day service! #7DayService #prioritymail #hahahaha | non_irony | irony (p = 0.99) |
non_irony (p = 0.01) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Overconfidence | medium 🟡 | avg_digits(text) >= 0.010 |
Overconfidence rate = 0.892 | — | +13.47% than global |
🔍✨Examples
For records in the dataset where `avg_digits(text)` >= 0.010, we found a significantly higher number of overconfident wrong predictions (99 samples, corresponding to 89.1891891891892% of the wrong predictions in the data slice).text | avg_digits(text) | label | Predicted label |
|
---|---|---|---|---|
276 | Love being called into work on my morning off after not even 6 hours of sleep. #thanks #splitshift | 0.010101 | non_irony | irony (p = 0.99) |
non_irony (p = 0.01) | ||||
829 | It's super duper fun waking up and immediately shoveling your car out of your driveway for 20 minutes #fuckinsnow | 0.0175439 | non_irony | irony (p = 0.99) |
non_irony (p = 0.01) | ||||
309 | Isn't it great to sleep 5 hours and feel like a million bucks? #gettingold | 0.0133333 | non_irony | irony (p = 0.99) |
non_irony (p = 0.01) |
👉Ethical issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | major 🔴 | — | Fail rate = 0.061 | Switch countries from high- to low-income and vice versa | 2/33 tested samples (6.06%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 6.06% of the cases. We expected the predictions not to be affected by this transformation.text | Switch countries from high- to low-income and vice versa(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
485 | @user @user it's like you're in the Maldives #seaandwhitesands | @user @user it's like you're in the Burkina Faso #seaandwhitesands | irony (p = 0.61) | non_irony (p = 0.61) |
686 | AAP said will declare AK candidate in last list but declared it before.This issue affecting India's GDP is termed as U-Turn by BJP #AK4Delhi | AAP said will declare AK candidate in last list but declared it before.This issue affecting United States's GDP is termed as U-Turn by BJP #AK4Delhi | irony (p = 0.50) | non_irony (p = 0.52) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | medium 🟡 | — | Fail rate = 0.011 | Switch Gender | 1/94 tested samples (1.06%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 1.06% of the cases. We expected the predictions not to be affected by this transformation.👉Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.201 | Transform to uppercase | 192/953 tested samples (20.15%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 20.15% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | !!! RT @USER OF ALL THE PLACES TO GET STUCK IN A TRAFFIC JAM | irony (p = 0.51) | non_irony (p = 0.78) |
13 | Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. | WORKAHOLICS: IF YOU'RE SICK, DON'T LET THAT STOP YOU FROM BRINGING YOUR GERMS INTO THE OFFICE. WE ALL APPRECIATE YOUR COMMITMENT. | irony (p = 0.90) | non_irony (p = 0.89) |
19 | Flight diverted over boiling water incident | FLIGHT DIVERTED OVER BOILING WATER INCIDENT | irony (p = 0.70) | non_irony (p = 0.88) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.155 | Transform to title case | 148/953 tested samples (15.53%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 15.53% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | !!! Rt @User Of All The Places To Get Stuck In A Traffic Jam | irony (p = 0.51) | non_irony (p = 0.80) |
13 | Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. | Workaholics: If You'Re Sick, Don'T Let That Stop You From Bringing Your Germs Into The Office. We All Appreciate Your Commitment. | irony (p = 0.90) | non_irony (p = 0.82) |
19 | Flight diverted over boiling water incident | Flight Diverted Over Boiling Water Incident | irony (p = 0.70) | non_irony (p = 0.79) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.120 | Add typos | 102/851 tested samples (11.99%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.99% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
7 | #notcies #eu EU backs 328 top early-career researchers with 485 million | #niotcies #eu EU backs 328 top early-carere researchers with 485 million | non_irony (p = 0.64) | irony (p = 0.54) |
22 | @user @user @user Well done. You have more Twitter followers than me. You have succeeded in life | @user @user @user Well dkone. You have morre Twitter folloers than me. You have sucdeeded in life | irony (p = 0.89) | non_irony (p = 0.92) |
55 | @user @user you can't reason with someone with a bio as moronic as his. "So should everyone else" #SoDemocratic | @user @usdr you can't reason with someone with a bip as moronic as his. "So shou everyone else" #SoDemoctatic | irony (p = 0.59) | non_irony (p = 0.55) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.088 | Punctuation Removal | 68/773 tested samples (8.8%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.8% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | RT @user Of all the places to get stuck in a traffic jam | irony (p = 0.51) | non_irony (p = 0.63) |
15 | What else would you do on friday? | #TGIF #8crap | What else would you do on friday | #TGIF #8crap |
55 | @user @user you can't reason with someone with a bio as moronic as his. "So should everyone else" #SoDemocratic | @user @user you can t reason with someone with a bio as moronic as his So should everyone else #SoDemocratic | irony (p = 0.59) | non_irony (p = 0.53) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.052 | Transform to lowercase | 44/852 tested samples (5.16%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.16% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | !!! RT @user Of all the places to get stuck in a traffic jam | !!! rt @user of all the places to get stuck in a traffic jam | irony (p = 0.51) | non_irony (p = 0.62) |
29 | @user Frisky at 2am? That's nothing new. | @user frisky at 2am? that's nothing new. | non_irony (p = 0.57) | irony (p = 0.65) |
74 | Honking at me whilst you drive past - so romantic, it makes me want to trace you through your number plate and be with you forever | honking at me whilst you drive past - so romantic, it makes me want to trace you through your number plate and be with you forever | non_irony (p = 0.53) | irony (p = 0.51) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!