max audio model input lenght

#6
by arubittu - opened

what is the maximum audio input lenght I can classify? assuming my sampling lenght is 16 khz. I have tried inferencing with input size up to 100 seconds (100 * 16k size array) and it gives the output. What input size is this model trained to accept? will it have the same performance at larger sizes?

audEERING GmbH org

there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds

i want to do classification on audio clips of larger lenght , around 1 min, the performance should get better right since I am providing the model with more data to classify?

audEERING GmbH org
edited May 21

i guess best performance would be to segment them and then pool the predictions per speaker, but you could try both and compare

there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds

did you use dynamic padding for batches? which is why 2 to 6s ?

This comment has been hidden

@felixbur

It showed that performance doesn't drop until 3 seconds

Meaning everything above 3 seconds is worse than 3 seconds or lower? Or am I missing something?
If so, this seems rather unexpected.

audEERING GmbH org

sorry, that was badly written,
No: meaning the performance below 3 seconds is worse. From 3 seconds on it's stable.

@felixbur no problem, thank you very much :)

@felixbur what in your opinion is the most optimal audio length for having the best accuracy?

audEERING GmbH org

3 seconds and more. if you have several samples per speaker, use majority voting

Sign up or log in to comment