ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition · I can't seem to get the model to work

Jul 10, 2023

•

edited Jul 10, 2023

I am running the model in my on jupyter notebook as well as in the hosted inference API and I can't get the model to accurately classify audio. I input audio that is clearly positive (tone and content) and it fails to converge (see photo). What am I doing wrong? (For context the audio is about 7 seconds long - perhaps that is an issue?)

Snaffal

Feb 1, 2024

Yea same, i tried to upload a few audio clips from the original RAVDESS dataset and it's not able to predict the labels correctly.

t4ehye0ng

May 27, 2024

It is the same to me.

yilmazay

Jun 11, 2024

•

edited Jun 11, 2024

I have the same problem as well. I tried many sample labeled short audios, none of them gave any reasonable result.
Even some of them yields negative values where I expect the maximum value. I would appreciate if anyone tells us what we are doing wrong.

[Added after a while]
Actually; after carefully comparing the results with the RAVDESS dataset, I can get a very reasonable accuracy, around 80-85 %.
However, the problem is, it seems like it only works with samples from actors, with real data, the results are not so good.
So, definetly it needs some fine tuning with real data.

ehcalabres
/

wav2vec2-lg-xlsr-en-speech-emotion-recognition

I can't seem to get the model to work - what am I doing wrong?