license: apache-2.0
language:
- de
base_model:
- dbmdz/bert-base-german-uncased
pipeline_tag: text-classification
Social Media Style Classifier for Climate Change Text (German)
This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text about Climate Change is written in a social media style.
Social media texts were gathered from GerCCT and r/Klimawandel.
Non-social media texts were gathered by tokenizing sentences from 15 Wikipedia articles:
- Klimawandel,
- Globale Erwärmung,
- Forschungsgeschichte des Klimawandels,
- Klimahysterie,
- Klimawandelleugnung,
- Folgen der globalen Erwärmung in der Arktis
- Folgen der globalen Erwärmung
- Klimamodell
- Anpassung an die globale Erwärmung
- Kontroverse um die globale Erwärmung
- UN-Klimakonferenz in Dubai 2023
- Umweltbewegung
- Treibhausgas
- Treibhauseffekt
- Klimaschutz
The dataset contained about 8K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing. The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.
The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.
How to use
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
model_name = "rabuahmad/cc-tweets-classifier-de"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)
text = "Gestern war ein schöner Tag!"
result = classifier(text)
Label 1 indicates that the text is predicted to be a tweet.
Evaluation
Evaluation results on the test set:
Metric | Score |
---|---|
Accuracy | 0.96494 |
Precision | 0.97552 |
Recall | 0.95564 |
F1 | 0.96547 |