Issue with Bark TTS pipeline example
#18
by
WpythonW
- opened
Here's a solution for the Bark TTS example from the documentation:
The original code from docs:
from transformers import pipeline
import scipy
synthesiser = pipeline("text-to-speech", "suno/bark-small")
speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})
scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])
Fails because the audio data has shape (1, N) instead of required (N,). To fix this, reshape the audio data using squeeze():
from transformers import pipeline
import scipy
synthesiser = pipeline("text-to-speech", "suno/bark-small")
speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})
# Fix: reshape audio from (1, N) to (N,) before saving
audio_reshaped = speech["audio"].squeeze()
scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=audio_reshaped)
This resolves the "ushort format requires 0 <= number <= (0x7fff * 2 + 1)" error by providing the audio data in the correct shape for WAV file writing.