File size: 2,717 Bytes
092a428 7a76c4f 0e045a8 7a76c4f 079ee57 092a428 ada1ec8 092a428 e06f86e 092a428 053c155 75b9a09 a51ab62 75b9a09 0a8baea 75b9a09 30433e3 04392f3 053c155 75b9a09 f7dc7cd c1f87b8 f7dc7cd c1f87b8 80a6c9d 053c155 80a6c9d 93618c7 5a9e9a6 053c155 5a9e9a6 56a10c2 5a9e9a6 155a92b 4f2ad24 ab82646 67e2b84 4f2ad24 bcb853b 7f8f33e 88432a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
language: "en"
tags:
- distilroberta
- sentiment
- emotion
- twitter
- reddit
widget:
- text: "Oh wow. I didn't know that."
- text: "This movie always makes me cry.."
- text: "Oh Happy Day"
---
## Description βΉ
With this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:
1) anger π€¬
2) disgust π€’
3) fear π¨
4) joy π
5) neutral π
6) sadness π
7) surprise π²
The model is a fine-tuned checkpoint of [DistilRoBERTa-base](https://huggingface.co/distilroberta-base).
## Application π
a) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/simple_emotion_pipeline.ipynb)
b) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/emotion_prediction_example.ipynb)
## Contact π»
Please reach out to jochen.hartmann@uni-hamburg.de if you have any questions or feedback.
Thanks to Samuel Domdey and chrsiebert for their support in making this model available.
## Appendix π
Please find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets. The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here.
|Name|anger|disgust|fear|joy|neutral|sadness|surprise|
|---|---|---|---|---|---|---|---|
|Crowdflower (2016)|Yes|-|-|Yes|Yes|Yes|Yes|
|Emotion Dataset, Elvis et al. (2018)|Yes|-|Yes|Yes|-|Yes|Yes|
|GoEmotions, Demszky et al. (2020)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|ISEAR, Vikash (2018)|Yes|Yes|Yes|Yes|-|Yes|-|
|MELD, Poria et al. (2019)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
|SemEval-2018, EI-reg (Mohammad et al. 2018) |Yes|-|Yes|Yes|-|Yes|-|
The model is trained on a balanced subset from the datasets listed above (2,811 observations per emotion, i.e., nearly 20k observations in total). 80% of this balanced subset is used for training and 20% for evaluation. The evaluation accuracy is 66% (vs. the random-chance baseline of 1/7 = 14%). |