JustinLin610's picture
first commit
ee21b96
|
raw
history blame
3.35 kB

Unit to Speech Model (unit2speech)

Unit to speech model is modified Tacotron2 model that learns to synthesize speech from discrete speech units. All models are trained on quantized LJSpeech.

Upstream Units Download Link
Log Mel Filterbank + KM50 download
Log Mel Filterbank + KM100 download
Log Mel Filterbank + KM200 download
Log Mel Filterbank + KM500 download
Modified CPC + KM50 download
Modified CPC + KM100 download
Modified CPC + KM200 download
Modified CPC + KM500 download
HuBERT Base + KM50 download
HuBERT Base + KM100 download
HuBERT Base + KM200 download
HuBERT Base + KM500 download
wav2vec 2.0 Large + KM50 download
wav2vec 2.0 Large + KM100 download
wav2vec 2.0 Large + KM200 download
wav2vec 2.0 Large + KM500 download

Run inference using a unit2speech model

  • Install librosa, unidecode and inflect using pip install librosa, unidecode, inflect
  • Download Waveglow checkpoint. This is the vocoder.

Sample commnd to run inference using trained unit2speech models. Please note that the quantized audio to synthesized should be using the same units as the unit2speech model was trained with.

FAIRSEQ_ROOT=<path_to_your_fairseq_repo_root>
TTS_MODEL_PATH=<unit2speech_model_file_path>
QUANTIZED_UNIT_PATH=<quantized_audio_file_path>
OUT_DIR=<dir_to_dump_synthesized_audio_files>
WAVEGLOW_PATH=<path_where_you_have_downloaded_waveglow_checkpoint>

PYTHONPATH=${FAIRSEQ_ROOT}:${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech python ${FAIRSEQ_ROOT}/examples/textless_nlp/gslm/unit2speech/synthesize_audio_from_units.py \
    --tts_model_path $TTS_MODEL_PATH \
    --quantized_unit_path $QUANTIZED_UNIT_PATH \
    --out_audio_dir $OUT_DIR \
    --waveglow_path  $WAVEGLOW_PATH \
    --max_decoder_steps 2000