cvcio/comments-el-toxic · Not getting consistent results

Jan 5, 2024

Hi,

I've used your model through the inference widget of the Hugging Face website and programmaticaly through python. When I pass an input I get different results. Seems like I've got a problem with the model or the tokenization when I try it programmatically. Do you know what it coud be?

Thanks,
Spyros

andefined

Civic Information Office org Jan 6, 2024

Hi @spyroskoun , I am not sure I can reproduce this issue. I've tried to use an older version of transformers==4.19.2and had some differences but I think it is due to the function_to_apply. Can you share an example +transformers version?

from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

MODEL = "cvcio/comments-el-toxic"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
pipe = pipeline("text-classification", model = model, tokenizer = tokenizer, return_all_scores = True)

# random [article from makeleio](https://www.makeleio.gr/%ce%b5%cf%80%ce%b9%ce%ba%ce%b1%ce%b9%cf%81%ce%bf%cf%84%ce%b7%cf%84%ce%b1/a%cf%85%cf%84%cf%8c-%cf%84%ce%bf-%ce%b1%ce%bd%ce%b4%cf%81%ce%b5%ce%af%ce%ba%ce%b5%ce%bb%ce%bf-%cf%80%ce%bf%cf%85-%ce%bb%ce%ad%ce%b3%ce%b5%cf%84%ce%b1%ce%b9-%ce%a7%cf%81%cf%85%cf%83%ce%bf%cf%87%ce%bf/)
pipe("Aυτό το ανδρείκελο που λέγεται Χρυσοχοίδης, είναι ο υπουργός που μου αφαίρεσε- εσκεμμένα- την αστυνομική φύλαξη και ήρθαν για να με σκοτώσουν. Μιχαλάκη, καλώς όρισες. Και τώρα οι δυο μας, θρασύτατε πολιτικέ αλητάκο")

# will return (same as the inference widget)
[[{'label': 'TOXIC', 'score': 0.7913345098495483}, {'label': 'SEVERE_TOXIC', 'score': 0.7585222125053406}, {'label': 'INSULT', 'score': 0.7705711126327515}, {'label': 'IDENTITY_HATE', 'score': 0.038777269423007965}]]

spyroskoun

Jan 6, 2024

Oh, great! I was using a different way thus getting weird results. I am now getting the results I wanted. Thanks!

andefined changed discussion status to closed Jan 7, 2024