firdhokk commited on
Commit
611e6db
Β·
verified Β·
1 Parent(s): a9176ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -8
README.md CHANGED
@@ -17,26 +17,43 @@ model-index:
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
  should probably proofread and complete it, then remove this comment. -->
19
  # 🎧 **Speech Emotion Recognition with Wav2Vec2**
20
-
21
- This project leverages the **Wav2Vec2** model to recognize emotions in speech. The goal is to classify audio recordings into different emotional categories, such as **Happy**, **Sad**, and **Surprised**.
22
 
23
 
24
  ## πŸ—‚ **Dataset**
25
- The dataset for this project is derived from multiple publicly available speech emotion datasets: [RAVDESS](https://zenodo.org/records/1188976#.XsAXemgzaUk), [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee/data), [TESS](https://tspace.library.utoronto.ca/handle/1807/24487), and [URDU](https://www.kaggle.com/datasets/bitlord/urdu-language-speech-dataset) dataset
26
- We filter out the "calm" emotion from the dataset to focus on the more expressive emotions. The dataset is split into **training** (80%) and **testing** (20%) sets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
 
29
  ## 🎀 **Preprocessing**
30
- - **Audio Loading**: Using **Librosa**, we load the audio files and convert them to numpy arrays.
31
  - **Feature Extraction**: The audio data is processed using the **Wav2Vec2 Feature Extractor**, which standardizes and normalizes the audio features for input to the model.
32
 
33
 
34
  ## πŸ”§ **Model**
35
  The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
36
  - **Model**: [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
37
- - **Output**: Emotion labels (`Angry', 'Disgust','Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
38
-
39
- We map the emotion labels to numeric IDs and use them for model training and evaluation.
40
 
41
 
42
  ## βš™οΈ **Training**
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
  should probably proofread and complete it, then remove this comment. -->
19
  # 🎧 **Speech Emotion Recognition with Wav2Vec2**
20
+ This project leverages the **Wav2Vec2** model to recognize emotions in speech. The goal is to classify audio recordings into different emotional categories, such as **Happy**, **Sad**, **Surprised**, and etc.
 
21
 
22
 
23
  ## πŸ—‚ **Dataset**
24
+ The dataset used for training and evaluation is sourced from multiple datasets, including:
25
+
26
+ - [RAVDESS](https://zenodo.org/records/1188976#.XsAXemgzaUk)
27
+ - [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee/data)
28
+ - [TESS](https://tspace.library.utoronto.ca/handle/1807/24487)
29
+ - [URDU](https://www.kaggle.com/datasets/bitlord/urdu-language-speech-dataset)
30
+
31
+ The dataset contains recordings labeled with various emotions. Below is the distribution of the emotions in the dataset:
32
+
33
+ | **Emotion** | **Count** |
34
+ |-------------|-----------|
35
+ | sad | 752 |
36
+ | happy | 752 |
37
+ | angry | 752 |
38
+ | neutral | 716 |
39
+ | disgust | 652 |
40
+ | fearful | 652 |
41
+ | surprised | 652 |
42
+ | calm | 192 |
43
+
44
+ This distribution reflects the balance of emotions in the dataset, with some emotions having more samples than others. Excluded the "calm" emotion during training due to its underrepresentation.
45
 
46
 
47
  ## 🎀 **Preprocessing**
48
+ - **Audio Loading**: Using **Librosa** to load the audio files and convert them to numpy arrays.
49
  - **Feature Extraction**: The audio data is processed using the **Wav2Vec2 Feature Extractor**, which standardizes and normalizes the audio features for input to the model.
50
 
51
 
52
  ## πŸ”§ **Model**
53
  The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
54
  - **Model**: [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
55
+ - **Output**: Emotion labels (`Angry', 'Disgust', 'Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
56
+ I map the emotion labels to numeric IDs and use them for model training and evaluation.
 
57
 
58
 
59
  ## βš™οΈ **Training**