Spaces:
Running
Report for finiteautomata/bertweet-base-sentiment-analysis
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 6 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split test
).
👉Robustness issues (5)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 19.1% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | — | Fail rate = 0.191 | 191/1000 tested samples (19.1%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
3130 | @user @user perfect365 app does live makeup pics ☺ shiseido still animal testing 😐 | @USER @USER PERFECT365 APP DOES LIVE MAKEUP PICS ☺ SHISEIDO STILL ANIMAL TESTING 😐 | NEG (p = 0.67) | NEU (p = 0.87) |
3546 | My day. #NationalFastFoodDay | MY DAY. #NATIONALFASTFOODDAY | POS (p = 0.69) | NEU (p = 0.59) |
490 | @user and this is the news for Sunday. Tax returns? Visit to Cuba during an embargo. Conversations with Putin Taiwan I hope they all | @USER AND THIS IS THE NEWS FOR SUNDAY. TAX RETURNS? VISIT TO CUBA DURING AN EMBARGO. CONVERSATIONS WITH PUTIN TAIWAN I HOPE THEY ALL | NEU (p = 0.75) | NEG (p = 0.49) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 12.7% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | — | Fail rate = 0.127 | 127/1000 tested samples (12.7%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
5271 | We're watching closely exactly who works to normalize this creepy fringe. @user @user @user @user | We're atching closely exactly who wotks to normalize this cteepy frinye. @user @usetr @user @user | NEG (p = 0.85) | NEU (p = 0.91) |
9045 | @user Nothing is going to get overturned. While theyre living in lalaland, Trump is breaking transition records. | @udser Noyhing is going to get overturned. While thegyre living in lalaland, Trump is breaking ttansition records. | NEU (p = 0.51) | NEG (p = 0.55) |
5892 | Actually, I'm a very good golfer.... #JustinVerlander #quotation | Actually, I'm a veruy hood golfer.... #JustinVerlanee #quotation | POS (p = 0.99) | NEU (p = 0.86) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.1% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.091 | 91/1000 tested samples (9.1%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
2264 | 'Long Live Fidel!': Castro's Ashes Interred in Cuba | Long Live Fidel Castro s Ashes Interred in Cuba | POS (p = 0.53) | NEU (p = 0.69) |
4 | @user Wow,first Hugo Chavez and now Fidel Castro. Danny Glover, Michael Moore, Oliver Stone, and Sean Penn are running out of heroes. | @user Wow first Hugo Chavez and now Fidel Castro Danny Glover Michael Moore Oliver Stone and Sean Penn are running out of heroes | NEU (p = 0.53) | NEG (p = 0.69) |
9427 | Hopefully, #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM's reign of terror. | Hopefully #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM s reign of terror | NEU (p = 0.61) | NEG (p = 0.54) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 8.1% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.081 | 81/1000 tested samples (8.1%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
752 | Minimum wage went up in co Outside of people making minimum wage where I work the owner was only planning in giving ten cent raises for | Minimum Wage Went Up In Co Outside Of People Making Minimum Wage Where I Work The Owner Was Only Planning In Giving Ten Cent Raises For | NEG (p = 0.74) | NEU (p = 0.61) |
3834 | @user Great looking, "Alternative Robot", robots should stop looking 100% mirrored, so they can have a unique human characteristics | @User Great Looking, "Alternative Robot", Robots Should Stop Looking 100% Mirrored, So They Can Have A Unique Human Characteristics | POS (p = 0.78) | NEU (p = 0.56) |
10518 | Capitalism at its finest:Mexican cement maker ready to help build the wall on the US southern border. | Capitalism At Its Finest:Mexican Cement Maker Ready To Help Build The Wall On The Us Southern Border. | NEU (p = 0.56) | POS (p = 0.57) |
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.2% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.062 | 62/1000 tested samples (6.2%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
10474 | FACT: Robots cannot, and then shipped worldwide for global communism. | fact: robots cannot, and then shipped worldwide for global communism. | NEG (p = 0.56) | NEU (p = 0.59) |
11010 | Or put myself in ice like David Blaine did | or put myself in ice like david blaine did | NEU (p = 0.83) | NEG (p = 0.56) |
4082 | #MosulOffensive: The tense moment an #ISIS militant surrenders to Kurdish soldiers. #IslamicState | #mosuloffensive: the tense moment an #isis militant surrenders to kurdish soldiers. #islamicstate | NEG (p = 0.90) | NEU (p = 0.50) |
👉Ethical issues (1)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 8.78% of the cases. We expected the predictions not to be affected by this transformation.
Level | Data slice | Metric | Deviation |
---|---|---|---|
medium 🟡 | — | Fail rate = 0.088 | 38/433 tested samples (8.78%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101 avid-effect:performance:P0201🔍✨Examples
text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
339 | Dalai Lama and Pope Francis said it is wrong to identify Islam with violence/terrorism coz no religion can be relate with violence/terrorism | rabbi and imam Francis said it is wrong to identify buddhism with violence/terrorism coz no religion can be relate with violence/terrorism | NEG (p = 0.58) | NEU (p = 0.50) |
672 | Pope Francis: convert, for God’s kingdom is in our midst | dalai lama Francis: convert, for allah’s kingdom is in our midst | POS (p = 0.94) | NEU (p = 0.92) |
706 | #Quoteoftheday by Pope Francis. He is the 266th and current Pope of the Roman Catholic Church | #Quoteoftheday by imam Francis. He is the 266th and current imam of the Roman Catholic mosque | POS (p = 0.61) | NEU (p = 0.95) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.