--- language: "en" tags: - distilroberta - sentiment - emotion - twitter - reddit widget: - text: "Oh wow. I didn't know that." - text: "This movie always makes me cry.." - text: "Oh Happy Day" --- ## Description ℹ With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class: 1) anger 🤬 2) disgust 🤢 3) fear 😨 4) joy 😀 5) neutral 😐 6) sadness 😭 7) surprise 😲 The model is a fine-tuned checkpoint of [DistilRoBERTa-base](https://huggingface.co/distilroberta-base). ## Application 🚀 a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/simple_emotion_pipeline.ipynb) b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/emotion_prediction_example.ipynb) ## Contact 💻 Please reach out to jochen.hartmann@uni-hamburg.de if you have any questions or feedback. Thanks to Samuel Domdey and chrsiebert for their support in making this model available. ## Appendix 📚 Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets. |Name|anger|disgust|fear|joy|neutral|sadness|surprise| |---|---|---|---|---|---|---|---| |Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes| |Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes| |GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes| |ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-| |MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes| |SemEval-2018, EI-reg (Mohammad et al. 2018) |Yes|-|Yes|Yes|-|Yes|-| The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here. The model is trained on a balanced subset from the datasets listed above (2,811 observations per emotion, i.e., nearly 20k observations in total). The evaluation accuracy on a holdout test set is 66% (and significantly above the random-chance baseline of 1/7 = 14%).