|
--- |
|
language: |
|
- en |
|
license: mit |
|
datasets: |
|
- cardiffnlp/x_sensitive |
|
metrics: |
|
- f1 |
|
widget: |
|
- text: Call me today to earn some money mofos! |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# twitter-roberta-base-sensitive-binary |
|
|
|
This is a RoBERTa-large model trained on 154M tweets until the end of December 2022 and finetuned for detecting sensitive content (multilabel classification) on the [_X-Sensitive_](https://huggingface.co/datasets/cardiffnlp/x_sensitive) dataset. |
|
The original Twitter-based RoBERTa model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-large-2022-154m). |
|
|
|
A sensitive content binary model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-large-sensitive-binary). |
|
|
|
|
|
|
|
## Labels |
|
``` |
|
"id2label": { |
|
"0": "conflictual", |
|
"1": "profanity", |
|
"2": "sex", |
|
"3": "drugs", |
|
"4": "selfharm", |
|
"5": "spam", |
|
"6": "not-sensitive" |
|
} |
|
``` |
|
|
|
## Full classification example |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline(model='cardiffnlp/twitter-roberta-large-sensitive-multilabel') |
|
text = "Call me today to earn some money mofos!" |
|
|
|
pipe(text) |
|
``` |
|
Output: |
|
|
|
``` |
|
[[{'label': 'conflictual', 'score': 0.03700090944766998}, |
|
{'label': 'profanity', 'score': 0.9770461916923523}, |
|
{'label': 'sex', 'score': 0.01981434039771557}, |
|
{'label': 'drugs', 'score': 0.017757439985871315}, |
|
{'label': 'selfharm', 'score': 0.008804548531770706}, |
|
{'label': 'spam', 'score': 0.07784222811460495}, |
|
{'label': 'not-sensitive', 'score': 0.010364986956119537}]] |
|
``` |
|
|
|
|
|
|
|
## BibTeX entry and citation info |
|
|
|
``` |
|
@article{antypas2024sensitive, |
|
title={Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation}, |
|
author={Antypas, Dimosthenis and Sen, Indira and Perez-Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco}, |
|
journal={arXiv preprint arXiv:2411.19832}, |
|
year={2024} |
|
} |
|
``` |