Spaces:
Runtime error
Runtime error
# Generative Spoken Language Modeling | |
* [Paper](https://arxiv.org/abs/2102.01192) | |
* [Demo](https://speechbot.github.io/gslm/index.html) | |
We build and evaluate generative speech2speech systems using [Log Mel Filtebank](https://pytorch.org/audio/stable/compliance.kaldi.html#fbank), [Modified CPC](https://github.com/facebookresearch/CPC_audio), [HuBERT Base](https://github.com/pytorch/fairseq/tree/main/examples/hubert) and [Wav2Vec 2.0 Large](https://github.com/pytorch/fairseq/tree/main/examples/wav2vec). Our system is composed of three components, namely, *speech2unit*, *ulm* and *unit2speech*. We explain about models and usage of these components in their respective sub-directories. See the links below. | |
## Speech to Unit Model (speech2unit) | |
Speech to unit model is used for quantizing raw speech into learned discrete speech units. [More details](speech2unit) | |
## Unit Language Model (ulm) | |
Unit Language Model is a generative language model trained on discrete speech units. [More details](ulm) | |
## Unit to Speech Model (unit2speech) | |
Unit to speech model is used for synthesizing speech from discrete speech units. [More details](unit2speech) | |
## Metrics | |
We show how to compute ASR based metrics as well as zero-shot metrics proposed in our paper [here](metrics). | |
## Tools | |
We share two tools to resynthesize a given spoken utterance, and generate novel spoken language given a spoken prompt. [More detail](tools) | |