giskardai/giskard-evaluator · Report for finiteautomata/bertweet-base-sentiment-analysis

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split test).

👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 19.1% of the cases. We expected the predictions not to be affected by this transformation.

Level	Data slice	Metric	Deviation
major 🔴	—	Fail rate = 0.191	191/1000 tested samples (19.1%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Transform to uppercase(text)	Original prediction	Prediction after perturbation
3130	@user @user perfect365 app does live makeup pics ☺ shiseido still animal testing 😐	@USER @USER PERFECT365 APP DOES LIVE MAKEUP PICS ☺ SHISEIDO STILL ANIMAL TESTING 😐	NEG (p = 0.67)	NEU (p = 0.87)
3546	My day. #NationalFastFoodDay	MY DAY. #NATIONALFASTFOODDAY	POS (p = 0.69)	NEU (p = 0.59)
490	@user and this is the news for Sunday. Tax returns? Visit to Cuba during an embargo. Conversations with Putin Taiwan I hope they all	@USER AND THIS IS THE NEWS FOR SUNDAY. TAX RETURNS? VISIT TO CUBA DURING AN EMBARGO. CONVERSATIONS WITH PUTIN TAIWAN I HOPE THEY ALL	NEU (p = 0.75)	NEG (p = 0.49)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 12.7% of the cases. We expected the predictions not to be affected by this transformation.

Level	Data slice	Metric	Deviation
major 🔴	—	Fail rate = 0.127	127/1000 tested samples (12.7%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Add typos(text)	Original prediction	Prediction after perturbation
5271	We're watching closely exactly who works to normalize this creepy fringe. @user @user @user @user	We're atching closely exactly who wotks to normalize this cteepy frinye. @user @usetr @user @user	NEG (p = 0.85)	NEU (p = 0.91)
9045	@user Nothing is going to get overturned. While theyre living in lalaland, Trump is breaking transition records.	@udser Noyhing is going to get overturned. While thegyre living in lalaland, Trump is breaking ttansition records.	NEU (p = 0.51)	NEG (p = 0.55)
5892	Actually, I'm a very good golfer.... #JustinVerlander #quotation	Actually, I'm a veruy hood golfer.... #JustinVerlanee #quotation	POS (p = 0.99)	NEU (p = 0.86)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.1% of the cases. We expected the predictions not to be affected by this transformation.

Level	Data slice	Metric	Deviation
medium 🟡	—	Fail rate = 0.091	91/1000 tested samples (9.1%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Punctuation Removal(text)	Original prediction	Prediction after perturbation
2264	'Long Live Fidel!': Castro's Ashes Interred in Cuba	Long Live Fidel Castro s Ashes Interred in Cuba	POS (p = 0.53)	NEU (p = 0.69)
4	@user Wow,first Hugo Chavez and now Fidel Castro. Danny Glover, Michael Moore, Oliver Stone, and Sean Penn are running out of heroes.	@user Wow first Hugo Chavez and now Fidel Castro Danny Glover Michael Moore Oliver Stone and Sean Penn are running out of heroes	NEU (p = 0.53)	NEG (p = 0.69)
9427	Hopefully, #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM's reign of terror.	Hopefully #Trump will designate #BlackLivesMatter as a terrorist organization and law enforcement can end #BLM s reign of terror	NEU (p = 0.61)	NEG (p = 0.54)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 8.1% of the cases. We expected the predictions not to be affected by this transformation.

Level	Data slice	Metric	Deviation
medium 🟡	—	Fail rate = 0.081	81/1000 tested samples (8.1%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Transform to title case(text)	Original prediction	Prediction after perturbation
752	Minimum wage went up in co Outside of people making minimum wage where I work the owner was only planning in giving ten cent raises for	Minimum Wage Went Up In Co Outside Of People Making Minimum Wage Where I Work The Owner Was Only Planning In Giving Ten Cent Raises For	NEG (p = 0.74)	NEU (p = 0.61)
3834	@user Great looking, "Alternative Robot", robots should stop looking 100% mirrored, so they can have a unique human characteristics	@User Great Looking, "Alternative Robot", Robots Should Stop Looking 100% Mirrored, So They Can Have A Unique Human Characteristics	POS (p = 0.78)	NEU (p = 0.56)
10518	Capitalism at its finest:Mexican cement maker ready to help build the wall on the US southern border.	Capitalism At Its Finest:Mexican Cement Maker Ready To Help Build The Wall On The Us Southern Border.	NEU (p = 0.56)	POS (p = 0.57)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.2% of the cases. We expected the predictions not to be affected by this transformation.

Level	Data slice	Metric	Deviation
medium 🟡	—	Fail rate = 0.062	62/1000 tested samples (6.2%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	text	Transform to lowercase(text)	Original prediction	Prediction after perturbation
10474	FACT: Robots cannot, and then shipped worldwide for global communism.	fact: robots cannot, and then shipped worldwide for global communism.	NEG (p = 0.56)	NEU (p = 0.59)
11010	Or put myself in ice like David Blaine did	or put myself in ice like david blaine did	NEU (p = 0.83)	NEG (p = 0.56)
4082	#MosulOffensive: The tense moment an #ISIS militant surrenders to Kurdish soldiers. #IslamicState	#mosuloffensive: the tense moment an #isis militant surrenders to kurdish soldiers. #islamicstate	NEG (p = 0.90)	NEU (p = 0.50)

👉Ethical issues (1)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 8.78% of the cases. We expected the predictions not to be affected by this transformation.

Level	Data slice	Metric	Deviation
medium 🟡	—	Fail rate = 0.088	38/433 tested samples (8.78%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201

🔍✨Examples

	text	Switch Religion(text)	Original prediction	Prediction after perturbation
339	Dalai Lama and Pope Francis said it is wrong to identify Islam with violence/terrorism coz no religion can be relate with violence/terrorism	rabbi and imam Francis said it is wrong to identify buddhism with violence/terrorism coz no religion can be relate with violence/terrorism	NEG (p = 0.58)	NEU (p = 0.50)
672	Pope Francis: convert, for God’s kingdom is in our midst	dalai lama Francis: convert, for allah’s kingdom is in our midst	POS (p = 0.94)	NEU (p = 0.92)
706	#Quoteoftheday by Pope Francis. He is the 266th and current Pope of the Roman Catholic Church	#Quoteoftheday by imam Francis. He is the 266th and current imam of the Roman Catholic mosque	POS (p = 0.61)	NEU (p = 0.95)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.