Upload 7 files
Browse files- README.md +165 -0
- config.json +46 -0
- pytorch_model.bin +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +16 -0
- trainer_state.json +118 -0
- val-results.json +10 -0
README.md
ADDED
@@ -0,0 +1,165 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
tags:
|
4 |
+
- emoberta
|
5 |
+
- roberta
|
6 |
+
license: mit
|
7 |
+
datasets:
|
8 |
+
- MELD
|
9 |
+
- IEMOCAP
|
10 |
+
---
|
11 |
+
|
12 |
+
Check https://github.com/tae898/erc for the details
|
13 |
+
|
14 |
+
[Watch a demo video!](https://youtu.be/qbr7fNd6J28)
|
15 |
+
|
16 |
+
# Emotion Recognition in Coversation (ERC)
|
17 |
+
|
18 |
+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/emoberta-speaker-aware-emotion-recognition-in/emotion-recognition-in-conversation-on)](https://paperswithcode.com/sota/emotion-recognition-in-conversation-on?p=emoberta-speaker-aware-emotion-recognition-in)
|
19 |
+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/emoberta-speaker-aware-emotion-recognition-in/emotion-recognition-in-conversation-on-meld)](https://paperswithcode.com/sota/emotion-recognition-in-conversation-on-meld?p=emoberta-speaker-aware-emotion-recognition-in)
|
20 |
+
|
21 |
+
At the moment, we only use the text modality to correctly classify the emotion of the utterances.The experiments were carried out on two datasets (i.e. MELD and IEMOCAP)
|
22 |
+
|
23 |
+
## Prerequisites
|
24 |
+
|
25 |
+
1. An x86-64 Unix or Unix-like machine
|
26 |
+
1. Python 3.8 or higher
|
27 |
+
1. Running in a virtual environment (e.g., conda, virtualenv, etc.) is highly recommended so that you don't mess up with the system python.
|
28 |
+
1. [`multimodal-datasets` repo](https://github.com/tae898/multimodal-datasets) (submodule)
|
29 |
+
1. pip install -r requirements.txt
|
30 |
+
|
31 |
+
## EmoBERTa training
|
32 |
+
|
33 |
+
First configure the hyper parameters and the dataset in `train-erc-text.yaml` and then,
|
34 |
+
In this directory run the below commands. I recommend you to run this in a virtualenv.
|
35 |
+
|
36 |
+
```sh
|
37 |
+
python train-erc-text.py
|
38 |
+
```
|
39 |
+
|
40 |
+
This will subsequently call `train-erc-text-hp.py` and `train-erc-text-full.py`.
|
41 |
+
|
42 |
+
## Results on the test split (weighted f1 scores)
|
43 |
+
|
44 |
+
| Model | | MELD | IEMOCAP |
|
45 |
+
| -------- | ------------------------------- | :-------: | :-------: |
|
46 |
+
| EmoBERTa | No past and future utterances | 63.46 | 56.09 |
|
47 |
+
| | Only past utterances | 64.55 | **68.57** |
|
48 |
+
| | Only future utterances | 64.23 | 66.56 |
|
49 |
+
| | Both past and future utterances | **65.61** | 67.42 |
|
50 |
+
| | → *without speaker names* | 65.07 | 64.02 |
|
51 |
+
|
52 |
+
Above numbers are the mean values of five random seed runs.
|
53 |
+
|
54 |
+
If you want to see more training test details, check out `./results/`
|
55 |
+
|
56 |
+
If you want to download the trained checkpoints and stuff, then [here](https://surfdrive.surf.nl/files/index.php/s/khREwk4MUI7MSnO/download) is where you can download them. It's a pretty big zip file.
|
57 |
+
|
58 |
+
## Deployment
|
59 |
+
|
60 |
+
### Huggingface
|
61 |
+
|
62 |
+
We have released our models on huggingface:
|
63 |
+
|
64 |
+
- [emoberta-base](https://huggingface.co/tae898/emoberta-base)
|
65 |
+
- [emoberta-large](https://huggingface.co/tae898/emoberta-large)
|
66 |
+
|
67 |
+
They are based on [RoBERTa-base](https://huggingface.co/roberta-base) and [RoBERTa-large](https://huggingface.co/roberta-large), respectively. They were trained on [both MELD and IEMOCAP datasets](utterance-ordered-MELD_IEMOCAP.json). Our deployed models are neither speaker-aware nor take previous utterances into account, meaning that it only classifies one utterance at a time without the speaker information (e.g., "I love you").
|
68 |
+
|
69 |
+
### Flask app
|
70 |
+
|
71 |
+
You can either run the Flask RESTful server app as a docker container or just as a python script.
|
72 |
+
|
73 |
+
1. Running the app as a docker container **(recommended)**.
|
74 |
+
|
75 |
+
There are four images. Take what you need:
|
76 |
+
|
77 |
+
- `docker run -it --rm -p 10006:10006 tae898/emoberta-base`
|
78 |
+
- `docker run -it --rm -p 10006:10006 --gpus all tae898/emoberta-base-cuda`
|
79 |
+
- `docker run -it --rm -p 10006:10006 tae898/emoberta-large`
|
80 |
+
- `docker run -it --rm -p 10006:10006 --gpus all tae898/emoberta-large-cuda`
|
81 |
+
|
82 |
+
1. Running the app in your python environment:
|
83 |
+
|
84 |
+
This method is less recommended than the docker one.
|
85 |
+
|
86 |
+
Run `pip install -r requirements-deploy.txt` first.<br>
|
87 |
+
The [`app.py`](app.py) is a flask RESTful server. The usage is below:
|
88 |
+
|
89 |
+
```console
|
90 |
+
app.py [-h] [--host HOST] [--port PORT] [--device DEVICE] [--model-type MODEL_TYPE]
|
91 |
+
```
|
92 |
+
|
93 |
+
For example:
|
94 |
+
|
95 |
+
```sh
|
96 |
+
python app.py --host 0.0.0.0 --port 10006 --device cpu --model-type emoberta-base
|
97 |
+
```
|
98 |
+
|
99 |
+
### Client
|
100 |
+
|
101 |
+
Once the app is running, you can send a text to the server. First install the necessary packages: `pip install -r requirements-client.txt`, and the run the [client.py](client.py). The usage is as below:
|
102 |
+
|
103 |
+
```console
|
104 |
+
client.py [-h] [--url-emoberta URL_EMOBERTA] --text TEXT
|
105 |
+
```
|
106 |
+
|
107 |
+
For example:
|
108 |
+
|
109 |
+
```sh
|
110 |
+
python client.py --text "Emotion recognition is so cool\!"
|
111 |
+
```
|
112 |
+
|
113 |
+
will give you:
|
114 |
+
|
115 |
+
```json
|
116 |
+
{
|
117 |
+
"neutral": 0.0049800905,
|
118 |
+
"joy": 0.96399665,
|
119 |
+
"surprise": 0.018937444,
|
120 |
+
"anger": 0.0071516023,
|
121 |
+
"sadness": 0.002021492,
|
122 |
+
"disgust": 0.001495996,
|
123 |
+
"fear": 0.0014167271
|
124 |
+
}
|
125 |
+
```
|
126 |
+
|
127 |
+
## Troubleshooting
|
128 |
+
|
129 |
+
The best way to find and solve your problems is to see in the github issue tab. If you can't find what you want, feel free to raise an issue. We are pretty responsive.
|
130 |
+
|
131 |
+
## Contributing
|
132 |
+
|
133 |
+
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.
|
134 |
+
|
135 |
+
1. Fork the Project
|
136 |
+
1. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
|
137 |
+
1. Run `make style && quality` in the root repo directory, to ensure code quality.
|
138 |
+
1. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
|
139 |
+
1. Push to the Branch (`git push origin feature/AmazingFeature`)
|
140 |
+
1. Open a Pull Request
|
141 |
+
|
142 |
+
## Cite our work
|
143 |
+
|
144 |
+
Check out the [paper](https://arxiv.org/abs/2108.12009).
|
145 |
+
|
146 |
+
```bibtex
|
147 |
+
@misc{kim2021emoberta,
|
148 |
+
title={EmoBERTa: Speaker-Aware Emotion Recognition in Conversation with RoBERTa},
|
149 |
+
author={Taewoon Kim and Piek Vossen},
|
150 |
+
year={2021},
|
151 |
+
eprint={2108.12009},
|
152 |
+
archivePrefix={arXiv},
|
153 |
+
primaryClass={cs.CL}
|
154 |
+
}
|
155 |
+
```
|
156 |
+
|
157 |
+
[![DOI](https://zenodo.org/badge/328375452.svg)](https://zenodo.org/badge/latestdoi/328375452)<br>
|
158 |
+
|
159 |
+
## Authors
|
160 |
+
|
161 |
+
- [Taewoon Kim](https://taewoonkim.com/)
|
162 |
+
|
163 |
+
## License
|
164 |
+
|
165 |
+
[MIT](https://choosealicense.com/licenses/mit/)
|
config.json
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/home/tk/repos/erc/emoberta-large",
|
3 |
+
"architectures": [
|
4 |
+
"RobertaForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"bos_token_id": 0,
|
8 |
+
"classifier_dropout": null,
|
9 |
+
"eos_token_id": 2,
|
10 |
+
"hidden_act": "gelu",
|
11 |
+
"hidden_dropout_prob": 0.1,
|
12 |
+
"hidden_size": 1024,
|
13 |
+
"id2label": {
|
14 |
+
"0": "neutral",
|
15 |
+
"1": "joy",
|
16 |
+
"2": "surprise",
|
17 |
+
"3": "anger",
|
18 |
+
"4": "sadness",
|
19 |
+
"5": "disgust",
|
20 |
+
"6": "fear"
|
21 |
+
},
|
22 |
+
"initializer_range": 0.02,
|
23 |
+
"intermediate_size": 4096,
|
24 |
+
"label2id": {
|
25 |
+
"anger": 3,
|
26 |
+
"disgust": 5,
|
27 |
+
"fear": 6,
|
28 |
+
"joy": 1,
|
29 |
+
"neutral": 0,
|
30 |
+
"sadness": 4,
|
31 |
+
"surprise": 2
|
32 |
+
},
|
33 |
+
"layer_norm_eps": 1e-05,
|
34 |
+
"max_position_embeddings": 514,
|
35 |
+
"model_type": "roberta",
|
36 |
+
"num_attention_heads": 16,
|
37 |
+
"num_hidden_layers": 24,
|
38 |
+
"pad_token_id": 1,
|
39 |
+
"position_embedding_type": "absolute",
|
40 |
+
"problem_type": "single_label_classification",
|
41 |
+
"torch_dtype": "float32",
|
42 |
+
"transformers_version": "4.16.2",
|
43 |
+
"type_vocab_size": 1,
|
44 |
+
"use_cache": true,
|
45 |
+
"vocab_size": 50265
|
46 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a8cfc5507bf55efc68b682d2b2258bec556e9689d40ed387bb54996f2bb73b0e
|
3 |
+
size 135
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"unk_token": "<unk>",
|
3 |
+
"bos_token": "<s>",
|
4 |
+
"eos_token": "</s>",
|
5 |
+
"add_prefix_space": false,
|
6 |
+
"errors": "replace",
|
7 |
+
"sep_token": "</s>",
|
8 |
+
"cls_token": "<s>",
|
9 |
+
"pad_token": "<pad>",
|
10 |
+
"mask_token": "<mask>",
|
11 |
+
"trim_offsets": true,
|
12 |
+
"model_max_length": 512,
|
13 |
+
"special_tokens_map_file": null,
|
14 |
+
"name_or_path": "roberta-large",
|
15 |
+
"tokenizer_class": "RobertaTokenizer"
|
16 |
+
}
|
trainer_state.json
ADDED
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"best_metric": 0.6281057228463983,
|
3 |
+
"best_model_checkpoint": "results/MELD_IEMOCAP/roberta-large/SEEDS/2022-03-14-19-06-40-speaker_mode-None-num_past_utterances-0-num_future_utterances-0-batch_size-16-seed-42/checkpoint-7086",
|
4 |
+
"epoch": 6.0,
|
5 |
+
"global_step": 7086,
|
6 |
+
"is_hyper_param_search": false,
|
7 |
+
"is_local_process_zero": true,
|
8 |
+
"is_world_process_zero": true,
|
9 |
+
"log_history": [
|
10 |
+
{
|
11 |
+
"epoch": 1.0,
|
12 |
+
"learning_rate": 1.1135900346584852e-06,
|
13 |
+
"loss": 1.6941,
|
14 |
+
"step": 1181
|
15 |
+
},
|
16 |
+
{
|
17 |
+
"epoch": 1.0,
|
18 |
+
"eval_f1_macro": 0.07791319292694036,
|
19 |
+
"eval_f1_micro": 0.35635123614663256,
|
20 |
+
"eval_f1_weighted": 0.19072939518738496,
|
21 |
+
"eval_loss": 1.5153617858886719,
|
22 |
+
"eval_runtime": 3.1794,
|
23 |
+
"eval_samples_per_second": 737.872,
|
24 |
+
"eval_steps_per_second": 23.275,
|
25 |
+
"step": 1181
|
26 |
+
},
|
27 |
+
{
|
28 |
+
"epoch": 2.0,
|
29 |
+
"learning_rate": 2.2338198736122084e-06,
|
30 |
+
"loss": 1.298,
|
31 |
+
"step": 2362
|
32 |
+
},
|
33 |
+
{
|
34 |
+
"epoch": 2.0,
|
35 |
+
"eval_f1_macro": 0.39258179738738447,
|
36 |
+
"eval_f1_micro": 0.5882352941176471,
|
37 |
+
"eval_f1_weighted": 0.570696903628535,
|
38 |
+
"eval_loss": 1.1774364709854126,
|
39 |
+
"eval_runtime": 3.1793,
|
40 |
+
"eval_samples_per_second": 737.903,
|
41 |
+
"eval_steps_per_second": 23.276,
|
42 |
+
"step": 2362
|
43 |
+
},
|
44 |
+
{
|
45 |
+
"epoch": 3.0,
|
46 |
+
"learning_rate": 3.3540497125659314e-06,
|
47 |
+
"loss": 1.0976,
|
48 |
+
"step": 3543
|
49 |
+
},
|
50 |
+
{
|
51 |
+
"epoch": 3.0,
|
52 |
+
"eval_f1_macro": 0.4195774350293237,
|
53 |
+
"eval_f1_micro": 0.6052855924978687,
|
54 |
+
"eval_f1_weighted": 0.5932857871930991,
|
55 |
+
"eval_loss": 1.0991432666778564,
|
56 |
+
"eval_runtime": 3.1785,
|
57 |
+
"eval_samples_per_second": 738.086,
|
58 |
+
"eval_steps_per_second": 23.281,
|
59 |
+
"step": 3543
|
60 |
+
},
|
61 |
+
{
|
62 |
+
"epoch": 4.0,
|
63 |
+
"learning_rate": 3.0822920081965484e-06,
|
64 |
+
"loss": 0.9869,
|
65 |
+
"step": 4724
|
66 |
+
},
|
67 |
+
{
|
68 |
+
"epoch": 4.0,
|
69 |
+
"eval_f1_macro": 0.4634417760566037,
|
70 |
+
"eval_f1_micro": 0.6287297527706734,
|
71 |
+
"eval_f1_weighted": 0.6211850773676282,
|
72 |
+
"eval_loss": 1.0642313957214355,
|
73 |
+
"eval_runtime": 3.1809,
|
74 |
+
"eval_samples_per_second": 737.537,
|
75 |
+
"eval_steps_per_second": 23.264,
|
76 |
+
"step": 4724
|
77 |
+
},
|
78 |
+
{
|
79 |
+
"epoch": 5.0,
|
80 |
+
"learning_rate": 2.8022345484581173e-06,
|
81 |
+
"loss": 0.8638,
|
82 |
+
"step": 5905
|
83 |
+
},
|
84 |
+
{
|
85 |
+
"epoch": 5.0,
|
86 |
+
"eval_f1_macro": 0.4789393979707725,
|
87 |
+
"eval_f1_micro": 0.6172208013640239,
|
88 |
+
"eval_f1_weighted": 0.6157912564989868,
|
89 |
+
"eval_loss": 1.0899823904037476,
|
90 |
+
"eval_runtime": 3.182,
|
91 |
+
"eval_samples_per_second": 737.269,
|
92 |
+
"eval_steps_per_second": 23.256,
|
93 |
+
"step": 5905
|
94 |
+
},
|
95 |
+
{
|
96 |
+
"epoch": 6.0,
|
97 |
+
"learning_rate": 2.522414224587374e-06,
|
98 |
+
"loss": 0.772,
|
99 |
+
"step": 7086
|
100 |
+
},
|
101 |
+
{
|
102 |
+
"epoch": 6.0,
|
103 |
+
"eval_f1_macro": 0.4957254043928597,
|
104 |
+
"eval_f1_micro": 0.6295822676896846,
|
105 |
+
"eval_f1_weighted": 0.6281057228463983,
|
106 |
+
"eval_loss": 1.120431661605835,
|
107 |
+
"eval_runtime": 3.1786,
|
108 |
+
"eval_samples_per_second": 738.072,
|
109 |
+
"eval_steps_per_second": 23.281,
|
110 |
+
"step": 7086
|
111 |
+
}
|
112 |
+
],
|
113 |
+
"max_steps": 17715,
|
114 |
+
"num_train_epochs": 15,
|
115 |
+
"total_flos": 8224038236774502.0,
|
116 |
+
"trial_name": null,
|
117 |
+
"trial_params": null
|
118 |
+
}
|
val-results.json
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"eval_loss": 1.120431661605835,
|
3 |
+
"eval_f1_weighted": 0.6281057228463983,
|
4 |
+
"eval_f1_micro": 0.6295822676896846,
|
5 |
+
"eval_f1_macro": 0.4957254043928597,
|
6 |
+
"eval_runtime": 3.2141,
|
7 |
+
"eval_samples_per_second": 729.912,
|
8 |
+
"eval_steps_per_second": 23.024,
|
9 |
+
"epoch": 15.0
|
10 |
+
}
|