--- license: apache-2.0 language: - en pipeline_tag: text-classification --- # DeTexD-RoBERTa-base delicate text detection This is a baseline RoBERTa-base model for the delicate text detection task. * Paper: [DeTexD: A Benchmark Dataset for Delicate Text Detection](TODO) * [GitHub repository](https://github.com/grammarly/detexd) ## Classification example code Here's a short usage example with the torch library in a binary classification task: ```python from transformers import pipeline classifier = pipeline("text-classification", model="grammarly/detexd-roberta-base") def predict_binary_score(text: str): # get multiclass probability scores scores = classifier(text, top_k=None) # convert to a single score by summing the probability scores # for the higher-index classes return sum(score['score'] for score in scores if score['label'] in ('LABEL_3', 'LABEL_4', 'LABEL_5')) def predict_delicate(text: str, threshold=0.72496545): return predict_binary_score(text) > threshold print(predict_delicate("Time flies like an arrow. Fruit flies like a banana.")) ``` Expected output: ``` False ``` ## Citation Information DeTexD: A Benchmark Dataset for Delicate Text Detection. Serhii Yavnyi, Oleksii Sliusarenko, Jade Razzaghi, Yichen Mo, Knar Hovakimyan, Artem Chernodub // [Accepted for publication at The 7th Workshop on Online Abuse and Harms (WOAH) at ACL 2023 in Toronto](https://www.workshopononlineabuse.com/)