lmxue commited on
Commit
0be671d
1 Parent(s): 6ee88e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md CHANGED
@@ -1,3 +1,57 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
  ---
6
+
7
+ # Pretrained Model of Amphion Vall-E
8
+
9
+ We provide the pre-trained checkpoint of [Vall-E](https://github.com/open-mmlab/Amphion/tree/main/egs/tts/VALLE) trained on [LibriTTS](https://github.com/open-mmlab/Amphion/tree/main/egs/datasets#libritts), which is is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz.
10
+
11
+
12
+
13
+ ## Quick Start
14
+
15
+ To utilize the pretrained models, just run the following commands:
16
+
17
+ ### Step1: Download the checkpoint
18
+ ```bash
19
+ git lfs install
20
+ git clone https://huggingface.co/amphion/valle-libritts
21
+ ```
22
+
23
+ ### Step2: Clone the Amphion's Source Code of GitHub
24
+ ```bash
25
+ git clone https://github.com/open-mmlab/Amphion.git
26
+ ```
27
+
28
+ ### Step3: Specify the checkpoint's path
29
+ Use the soft link to specify the downloaded checkpoint in the first step:
30
+
31
+ ```bash
32
+ cd Amphion
33
+ mkdir -p ckpts/tts
34
+ ln -s ../../../valle-libritts ckpts/tts/
35
+ ```
36
+
37
+ ### Step4: Inference
38
+
39
+ You can follow the inference part of [this recipe](https://github.com/open-mmlab/Amphion/tree/main/egs/tts/VALLE#4-inference) to generate speech from text. For example, if you want to synthesize a clip of speech with the text of "This is a clip of generated speech with the given text from Amphion Vall-E model.", just, run:
40
+
41
+ ```bash
42
+ sh egs/tts/VITS/run.sh --stage 3 --gpu "0" \
43
+ --config "ckpts/tts/vits-ljspeech/args.json" \
44
+ --infer_expt_dir "ckpts/tts/vits-ljspeech/" \
45
+ --infer_output_dir ckpts/tts/vits-ljspeech/result \
46
+ --infer_mode "single" \
47
+ --infer_text "This is a clip of generated speech with the given text from a TTS model."
48
+
49
+ sh egs/tts/VALLE/run.sh --stage 3 --gpu "0" \
50
+ --config "ckpts/tts/valle-libritts/args.json" \
51
+ --infer_expt_dir Amphion/ckpts/tts/valle-libritts \
52
+ --infer_output_dir Amphion/ckpts/tts/valle-libritts/result \
53
+ --infer_mode "single" \
54
+ --infer_text "This is a clip of generated speech with the given text from Amphion Vall-E model." \
55
+ --infer_text_prompt "But even the unsuccessful dramatist has his moments." \
56
+ --infer_audio_prompt egs/tts/VALLE/prompt_examples/7176_92135_000004_000000.wav
57
+ ```