Inputs recorded at different sampling rate generate drastically different embeddings?

#15
by anuragrawal - opened

Hi @speechbrainteam ,

I am using this model to generate speaker embeddings for one of my projects. I know that the model was trained on audios sampled at 16khz sampling rate and mono channel. My audios are recorded at 44.1khz. I am seeing drastically different outputs for when I down sample my 44.1k audios to 16k vs recording at 16k sampling rate. Outputs are much better when I record audios at 16k vs down sampling from 44.1k to 16k. Have you experienced this scenario before?

I am trying to establish if recording at 16k would be really beneficial. I have done some experiments but it's not easy to capture two exactly identical audios, one at 16k sampling rate and one at 44.1k sampling rate so I am reaching out here for your feedback. Please let me know if you need anything else.

Thanks!
Anurag Agrawal

Hey, you might want to check the the complete metadata of the audio you are recording. Other than Sample Rate, you will want to look at the Codec and Container - these capture audio quality (~bitrate). Sounds to me like you might be recording with some (lossy) compression, while this model expects uncompressed audio.

Sign up or log in to comment