firdhokk
/

speech-emotion-recognition-with-facebook-wav2vec2-large-xlsr-53

@@ -17,26 +17,43 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # 🎧 **Speech Emotion Recognition with Wav2Vec2**
-This project leverages the **Wav2Vec2** model to recognize emotions in speech. The goal is to classify audio recordings into different emotional categories, such as **Happy**, **Sad**, and **Surprised**.
 ## 🗂 **Dataset**
-The dataset for this project is derived from multiple publicly available speech emotion datasets: [RAVDESS](https://zenodo.org/records/1188976#.XsAXemgzaUk), [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee/data), [TESS](https://tspace.library.utoronto.ca/handle/1807/24487), and [URDU](https://www.kaggle.com/datasets/bitlord/urdu-language-speech-dataset) dataset
-We filter out the "calm" emotion from the dataset to focus on the more expressive emotions. The dataset is split into **training** (80%) and **testing** (20%) sets.
 ## 🎤 **Preprocessing**
-- **Audio Loading**: Using **Librosa**, we load the audio files and convert them to numpy arrays.
 - **Feature Extraction**: The audio data is processed using the **Wav2Vec2 Feature Extractor**, which standardizes and normalizes the audio features for input to the model.
 ## 🔧 **Model**
 The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
 - **Model**: [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
-- **Output**: Emotion labels (`Angry', 'Disgust','Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
-We map the emotion labels to numeric IDs and use them for model training and evaluation.
 ## ⚙️ **Training**

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # 🎧 **Speech Emotion Recognition with Wav2Vec2**
+This project leverages the **Wav2Vec2** model to recognize emotions in speech. The goal is to classify audio recordings into different emotional categories, such as **Happy**, **Sad**, **Surprised**, and etc.
 ## 🗂 **Dataset**
+The dataset used for training and evaluation is sourced from multiple datasets, including:
+- [RAVDESS](https://zenodo.org/records/1188976#.XsAXemgzaUk)
+- [SAVEE](https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee/data)
+- [TESS](https://tspace.library.utoronto.ca/handle/1807/24487)
+- [URDU](https://www.kaggle.com/datasets/bitlord/urdu-language-speech-dataset)
+The dataset contains recordings labeled with various emotions. Below is the distribution of the emotions in the dataset:
+| **Emotion** | **Count** |
+|-------------|-----------|
+| sad         | 752       |
+| happy       | 752       |
+| angry       | 752       |
+| neutral     | 716       |
+| disgust     | 652       |
+| fearful     | 652       |
+| surprised   | 652       |
+| calm        | 192       |
+This distribution reflects the balance of emotions in the dataset, with some emotions having more samples than others. Excluded the "calm" emotion during training due to its underrepresentation.
 ## 🎤 **Preprocessing**
+- **Audio Loading**: Using **Librosa** to load the audio files and convert them to numpy arrays.
 - **Feature Extraction**: The audio data is processed using the **Wav2Vec2 Feature Extractor**, which standardizes and normalizes the audio features for input to the model.
 ## 🔧 **Model**
 The model used is the **Wav2Vec2 Large XLR-53** model, fine-tuned for **audio classification** tasks:
 - **Model**: [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
+- **Output**: Emotion labels (`Angry', 'Disgust', 'Fearful', 'Happy', 'Neutral', 'Sad', 'Surprised'`)
+I map the emotion labels to numeric IDs and use them for model training and evaluation.
 ## ⚙️ **Training**