Issue with Bark TTS pipeline example

#18
by WpythonW - opened

Here's a solution for the Bark TTS example from the documentation:

The original code from docs:

from transformers import pipeline
import scipy

synthesiser = pipeline("text-to-speech", "suno/bark-small")
speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})
scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=speech["audio"])

Fails because the audio data has shape (1, N) instead of required (N,). To fix this, reshape the audio data using squeeze():

from transformers import pipeline
import scipy

synthesiser = pipeline("text-to-speech", "suno/bark-small")
speech = synthesiser("Hello, my dog is cooler than you!", forward_params={"do_sample": True})

# Fix: reshape audio from (1, N) to (N,) before saving
audio_reshaped = speech["audio"].squeeze()
scipy.io.wavfile.write("bark_out.wav", rate=speech["sampling_rate"], data=audio_reshaped)

This resolves the "ushort format requires 0 <= number <= (0x7fff * 2 + 1)" error by providing the audio data in the correct shape for WAV file writing.

Sign up or log in to comment