rabuahmad's picture
Update README.md
3d84b26 verified
metadata
license: apache-2.0
language:
  - de
base_model:
  - dbmdz/bert-base-german-uncased
pipeline_tag: text-classification

Social Media Style Classifier for Climate Change Text (German)

This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text about Climate Change is written in a social media style.

Social media texts were gathered from GerCCT and r/Klimawandel.

Non-social media texts were gathered by tokenizing sentences from 15 Wikipedia articles:

  1. Klimawandel,
  2. Globale Erwärmung,
  3. Forschungsgeschichte des Klimawandels,
  4. Klimahysterie,
  5. Klimawandelleugnung,
  6. Folgen der globalen Erwärmung in der Arktis
  7. Folgen der globalen Erwärmung
  8. Klimamodell
  9. Anpassung an die globale Erwärmung
  10. Kontroverse um die globale Erwärmung
  11. UN-Klimakonferenz in Dubai 2023
  12. Umweltbewegung
  13. Treibhausgas
  14. Treibhauseffekt
  15. Klimaschutz

The dataset contained about 8K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing. The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.

The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.

How to use

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_name = "rabuahmad/cc-tweets-classifier-de"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)

text = "Gestern war ein schöner Tag!"

result = classifier(text)

Label 1 indicates that the text is predicted to be a tweet.

Evaluation

Evaluation results on the test set:

Metric Score
Accuracy 0.96494
Precision 0.97552
Recall 0.95564
F1 0.96547