Output includes input at the beginning, and trims off most generated audio, seemingly. Worse for stereo audio.

#16
by R2Bl3nd - opened

For instance, if I want to generate 20 seconds of continuation for a 10 second audio clip, it sounds like the first 10 seconds of the output are just an AI recreation of the exact input clip; the continuation only starts 10 seconds into the 20 second clip, in that example. So, maybe there's a math mistake somewhere in the code?

It also seems like if the input is stereo audio, it causes the trimmed output to be doubled or looped, as if the stereo channels aren't being accounted for.

Or maybe I'm just using it wrong? Or, are these bugs that you're willing to fix, @radames ?

Thanks for providing this. I just noticed that it seemed off, so I wanted to point it out.

Sign up or log in to comment