fnlp
/

nutation commited on
Commit
c0bc90d
·
1 Parent(s): 4427722

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -3
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # USLM: Unified Speech Language Model
2
+ <a href='https://github.com/ZhangXInFD/SpeechTokenizer'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2308.16692'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
3
+
4
+ ## Introduction
5
+ Build upon [SpeechTokenizer](https://github.com/ZhangXInFD/SpeechTokenizer), USLM consists of autoregressive and non-autoregressive models, it can hierarchically model information in speech. The autoregressive (AR) model captures the content information by modeling tokens from the first RVQ quantizer. The non-autoregressive (NAR) model complements paralinguistic information for the AR model by generating tokens from the subsequent quantizers conditioned on the first-layer tokens.
6
+
7
+ <br>
8
+ <p align="center">
9
+ <img src="images/overview.png" width="95%"> <br>
10
+ Overview
11
+ </p>
12
+
13
+ ## Installation
14
+
15
+ To get up and running quickly just follow the steps below:
16
+
17
+ ```
18
+ # PyTorch
19
+ pip install torch==1.13.1 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
20
+ pip install torchmetrics==0.11.1
21
+ # fbank
22
+ pip install librosa==0.8.1
23
+
24
+ # phonemizer pypinyin
25
+ apt-get install espeak-ng
26
+ ## OSX: brew install espeak
27
+ pip install phonemizer==3.2.1 pypinyin==0.48.0
28
+
29
+ # lhotse update to newest version
30
+ # https://github.com/lhotse-speech/lhotse/pull/956
31
+ # https://github.com/lhotse-speech/lhotse/pull/960
32
+ pip uninstall lhotse
33
+ pip install git+https://github.com/lhotse-speech/lhotse
34
+
35
+ # k2
36
+ # find the right version in https://huggingface.co/csukuangfj/k2
37
+ pip install https://huggingface.co/csukuangfj/k2/resolve/main/cuda/k2-1.23.4.dev20230224+cuda11.6.torch1.13.1-cp310-cp310-linux_x86_64.whl
38
+
39
+ # icefall
40
+ git clone https://github.com/k2-fsa/icefall
41
+ cd icefall
42
+ pip install -r requirements.txt
43
+ export PYTHONPATH=`pwd`/../icefall:$PYTHONPATH
44
+ echo "export PYTHONPATH=`pwd`/../icefall:\$PYTHONPATH" >> ~/.zshrc
45
+ echo "export PYTHONPATH=`pwd`/../icefall:\$PYTHONPATH" >> ~/.bashrc
46
+ cd -
47
+ source ~/.zshrc
48
+
49
+ #SpeechTokenizer
50
+ pip install -U speechtokenizer
51
+
52
+ # uslm
53
+ git clone https://github.com/0nutation/USLM
54
+ cd USLM
55
+ pip install -e .
56
+ ```
57
+
58
+ ## USLM Models
59
+ This version of USLM is trained on the LibriTTS dataset, so the performance is not optimal due to data limitations.
60
+
61
+
62
+
63
+ ## Zero-shot TTS Using USLM
64
+ Download pre-trained SpeechTokenizer models:
65
+ ``` bash
66
+ st_dir="ckpt/speechtokenizer/"
67
+ mkdir -p ${st_dir}
68
+ cd ${st_dir}
69
+ wget "https://huggingface.co/fnlp/SpeechTokenizer/resolve/main/speechtokenizer_hubert_avg/SpeechTokenizer.pt"
70
+ wget "https://huggingface.co/fnlp/SpeechTokenizer/resolve/main/speechtokenizer_hubert_avg/config.json"
71
+ cd -
72
+ ```
73
+
74
+ Download pre-trained USLM models:
75
+ ``` bash
76
+ uslm_dir="ckpt/uslm/"
77
+ mkdir -p ${uslm_dir}
78
+ cd ${uslm_dir}
79
+ wget "https://huggingface.co/fnlp/USLM/resolve/main/USLM_ls960/USLM.pt"
80
+ wget "https://huggingface.co/fnlp/USLM/resolve/main/USLM_ls960/unique_text_tokens.k2symbols"
81
+ cd -
82
+ ```
83
+
84
+ Inference:
85
+ ``` bash
86
+ out_dir="output/"
87
+ mkdir -p ${out_dir}
88
+
89
+ python3 bin/infer.py --output-dir ${out_dir}/ \
90
+ --model-name uslm --norm-first true --add-prenet false \
91
+ --share-embedding true --norm-first true --add-prenet false \
92
+ --audio-extractor SpeechTokenizer \
93
+ --speechtokenizer-dir "${st_dir}" \
94
+ --checkpoint=${uslm_dir}/USLM.pt \
95
+ --text-tokens "${uslm_dir}/unique_text_tokens.k2symbols" \
96
+ --text-prompts "mr Soames was a tall, spare man, of a nervous and excitable temperament." \
97
+ --audio-prompts prompts/1580_141083_000002_000002.wav \
98
+ --text "Begin with the fundamental steps of the process. This will give you a solid foundation to build upon and boost your confidence. " \
99
+ ```
100
+
101
+ or you can directly run inference.sh
102
+ ``` bash
103
+ bash inference.sh
104
+ ```