--- language: - en license: mit datasets: - cardiffnlp/x_sensitive metrics: - f1 widget: - text: Call me today to earn some money mofos! pipeline_tag: text-classification --- # twitter-roberta-base-sensitive-binary This is a RoBERTa-large model trained on 154M tweets until the end of December 2022 and finetuned for detecting sensitive content (multilabel classification) on the [_X-Sensitive_](https://huggingface.co/datasets/cardiffnlp/x_sensitive) dataset. The original Twitter-based RoBERTa model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-large-2022-154m). A sensitive content binary model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-large-sensitive-binary). ## Labels ``` "id2label": { "0": "conflictual", "1": "profanity", "2": "sex", "3": "drugs", "4": "selfharm", "5": "spam", "6": "not-sensitive" } ``` ## Full classification example ```python from transformers import pipeline pipe = pipeline(model='cardiffnlp/twitter-roberta-large-sensitive-multilabel') text = "Call me today to earn some money mofos!" pipe(text) ``` Output: ``` [[{'label': 'conflictual', 'score': 0.03700090944766998}, {'label': 'profanity', 'score': 0.9770461916923523}, {'label': 'sex', 'score': 0.01981434039771557}, {'label': 'drugs', 'score': 0.017757439985871315}, {'label': 'selfharm', 'score': 0.008804548531770706}, {'label': 'spam', 'score': 0.07784222811460495}, {'label': 'not-sensitive', 'score': 0.010364986956119537}]] ``` ## BibTeX entry and citation info ``` @article{antypas2024sensitive, title={Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation}, author={Antypas, Dimosthenis and Sen, Indira and Perez-Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco}, journal={arXiv preprint arXiv:2411.19832}, year={2024} } ```