max audio model input lenght
what is the maximum audio input lenght I can classify? assuming my sampling lenght is 16 khz. I have tried inferencing with input size up to 100 seconds (100 * 16k size array) and it gives the output. What input size is this model trained to accept? will it have the same performance at larger sizes?
there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds
i want to do classification on audio clips of larger lenght , around 1 min, the performance should get better right since I am providing the model with more data to classify?
i guess best performance would be to segment them and then pool the predictions per speaker, but you could try both and compare
there is no official max lenght, it'S defined by your ram, but we trained with segmented audio, about 2-6 seconds.
It showed that performance doesn't drop until 3 seconds
did you use dynamic padding for batches? which is why 2 to 6s ?
sorry, that was badly written,
No: meaning the performance below 3 seconds is worse. From 3 seconds on it's stable.
3 seconds and more. if you have several samples per speaker, use majority voting