Update README.md
Browse files
README.md
CHANGED
@@ -17,26 +17,43 @@ model-index:
|
|
17 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
18 |
should probably proofread and complete it, then remove this comment. -->
|
19 |
# π§ **Speech Emotion Recognition with Wav2Vec2**
|
20 |
-
|
21 |
-
This project leverages the **Wav2Vec2** model to recognize emotions in speech. The goal is to classify audio recordings into different emotional categories, such as **Happy**, **Sad**, and **Surprised**.
|
22 |
|
23 |
|
24 |
## π **Dataset**
|
25 |
-
The dataset for
|
26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
|
29 |
## π€ **Preprocessing**
|
30 |
-
- **Audio Loading**: Using **Librosa
|
31 |
- **Feature Extraction**: The audio data is processed using the **Wav2Vec2 Feature Extractor**, which standardizes and normalizes the audio features for input to the model.
|
32 |
|
33 |
|
34 |
## π§ **Model**
|
35 |
The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
|
36 |
- **Model**: [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
37 |
-
- **Output**: Emotion labels (`Angry', 'Disgust','Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
|
38 |
-
|
39 |
-
We map the emotion labels to numeric IDs and use them for model training and evaluation.
|
40 |
|
41 |
|
42 |
## βοΈ **Training**
|
|
|
17 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
18 |
should probably proofread and complete it, then remove this comment. -->
|
19 |
# π§ **Speech Emotion Recognition with Wav2Vec2**
|
20 |
+
This project leverages the **Wav2Vec2** model to recognize emotions in speech. The goal is to classify audio recordings into different emotional categories, such as **Happy**, **Sad**, **Surprised**, and etc.
|
|
|
21 |
|
22 |
|
23 |
## π **Dataset**
|
24 |
+
The dataset used for training and evaluation is sourced from multiple datasets, including:
|
25 |
+
|
26 |
+
- [RAVDESS](https://zenodo.org/records/1188976#.XsAXemgzaUk)
|
27 |
+
- [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee/data)
|
28 |
+
- [TESS](https://tspace.library.utoronto.ca/handle/1807/24487)
|
29 |
+
- [URDU](https://www.kaggle.com/datasets/bitlord/urdu-language-speech-dataset)
|
30 |
+
|
31 |
+
The dataset contains recordings labeled with various emotions. Below is the distribution of the emotions in the dataset:
|
32 |
+
|
33 |
+
| **Emotion** | **Count** |
|
34 |
+
|-------------|-----------|
|
35 |
+
| sad | 752 |
|
36 |
+
| happy | 752 |
|
37 |
+
| angry | 752 |
|
38 |
+
| neutral | 716 |
|
39 |
+
| disgust | 652 |
|
40 |
+
| fearful | 652 |
|
41 |
+
| surprised | 652 |
|
42 |
+
| calm | 192 |
|
43 |
+
|
44 |
+
This distribution reflects the balance of emotions in the dataset, with some emotions having more samples than others. Excluded the "calm" emotion during training due to its underrepresentation.
|
45 |
|
46 |
|
47 |
## π€ **Preprocessing**
|
48 |
+
- **Audio Loading**: Using **Librosa** to load the audio files and convert them to numpy arrays.
|
49 |
- **Feature Extraction**: The audio data is processed using the **Wav2Vec2 Feature Extractor**, which standardizes and normalizes the audio features for input to the model.
|
50 |
|
51 |
|
52 |
## π§ **Model**
|
53 |
The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
|
54 |
- **Model**: [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
55 |
+
- **Output**: Emotion labels (`Angry', 'Disgust', 'Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
|
56 |
+
I map the emotion labels to numeric IDs and use them for model training and evaluation.
|
|
|
57 |
|
58 |
|
59 |
## βοΈ **Training**
|