I can't seem to get the model to work - what am I doing wrong?
I am running the model in my on jupyter notebook as well as in the hosted inference API and I can't get the model to accurately classify audio. I input audio that is clearly positive (tone and content) and it fails to converge (see photo). What am I doing wrong? (For context the audio is about 7 seconds long - perhaps that is an issue?)
Yea same, i tried to upload a few audio clips from the original RAVDESS dataset and it's not able to predict the labels correctly.
It is the same to me.
I have the same problem as well. I tried many sample labeled short audios, none of them gave any reasonable result.
Even some of them yields negative values where I expect the maximum value. I would appreciate if anyone tells us what we are doing wrong.
[Added after a while]
Actually; after carefully comparing the results with the RAVDESS dataset, I can get a very reasonable accuracy, around 80-85 %.
However, the problem is, it seems like it only works with samples from actors, with real data, the results are not so good.
So, definetly it needs some fine tuning with real data.