rabuahmad/cc-tweets-classifier-de

Social Media Style Classifier for Climate Change Text (German)

This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text about Climate Change is written in a social media style.

Social media texts were gathered from GerCCT and r/Klimawandel.

Non-social media texts were gathered by tokenizing sentences from 15 Wikipedia articles:

The dataset contained about 8K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing. The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.

The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_name = "rabuahmad/cc-tweets-classifier-de"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)

text = "Gestern war ein schöner Tag!"

result = classifier(text)

Label 1 indicates that the text is predicted to be a tweet.

Evaluation

Evaluation results on the test set:

Metric	Score
Accuracy	0.96494
Precision	0.97552
Recall	0.95564
F1	0.96547

rabuahmad
/

cc-tweets-classifier-de

Social Media Style Classifier for Climate Change Text (German)

How to use

Evaluation

Model tree for rabuahmad/cc-tweets-classifier-de