cardiffnlp/twitter-roberta-large-topic-latest

This is a RoBERTa-large model trained on 154M tweets until the end of December 2022 and finetuned for topic classification (multilabel classification) on the TweetTopic dataset of SuperTweetEval. The original Twitter-based RoBERTa model can be found here.

Labels

"id2label": { "0": "arts_&_culture", "1": "business_&_entrepreneurs", "2": "celebrity_&_pop_culture", "3": "diaries_&_daily_life", "4": "family", "5": "fashion_&_style", "6": "film_tv_&_video", "7": "fitness_&_health", "8": "food_&_dining", "9": "gaming", "10": "learning_&_educational", "11": "music", "12": "news_&_social_concern", "13": "other_hobbies", "14": "relationships", "15": "science_&_technology", "16": "sports", "17": "travel_&_adventure", "18": "youth_&_student_life" }

Example

from transformers import pipeline
text = "So @AB is just the latest victim of the madden curse. If you’re on the cover of that game your career will take a turn for the worse"

pipe = pipeline('text-classification', model="cardiffnlp/twitter-roberta-large-topic-latest", return_all_scores=True)
predictions = pipe(text)[0]
predictions = [x for x in predictions if x['score'] > 0.5]
predictions
>> [{'label': 'sports', 'score': 0.99379563331604}]

Citation Information

Please cite the reference paper if you use this model.

@inproceedings{antypas2023supertweeteval,
  title={SuperTweetEval: A Challenging, Unified and Heterogeneous Benchmark for Social Media NLP Research},
  author={Dimosthenis Antypas and Asahi Ushio and Francesco Barbieri and Leonardo Neves and Kiamehr Rezaee and Luis Espinosa-Anke and Jiaxin Pei and Jose Camacho-Collados},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
  year={2023}
}
Downloads last month
222
Safetensors
Model size
355M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cardiffnlp/twitter-roberta-large-topic-latest

Collection including cardiffnlp/twitter-roberta-large-topic-latest