firdhokk/speech-emotion-recognition-with-openai-whisper-large-v3

I was inspired by two papers, namely “Breaking the Silence: Whisper-Driven Emotion Recognition in AI Mental Support Models” and “EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Benchmark”, to develop a Whisper Large V3-based Speech Emotion Recognition (SER) project. This model excels in multilingual transcription and is resistant to accents and noise, making it a potential candidate for SER.

In the first paper, the integration of the Whisper encoder with an additional Transformer layer improved emotion detection accuracy to 95%. Meanwhile, the second paper shows that Whisper Large V3 excels in cross-corpus settings, proving its capability in multilingual SER.

Based on these two references, I am working on adapting the Whisper Large V3 architecture for multilingual Speech Emotion Recognition. However, I am still in the learning process, so if there are any inaccuracies or areas for improvement, I would greatly appreciate any suggestions and feedback to help me learn and refine my approach further.

firdhokk
/

speech-emotion-recognition-with-openai-whisper-large-v3

Multi Language Support??