Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- librispeech_asr
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
metrics:
|
7 |
+
- wer
|
8 |
+
tags:
|
9 |
+
- hubert
|
10 |
+
- tts
|
11 |
+
---
|
12 |
+
# voidful/mhubert-unit-tts
|
13 |
+
|
14 |
+
voidful/mhubert-unit-tts
|
15 |
+
|
16 |
+
This repository provides a text to unit model form mhubert and trained with bart model.
|
17 |
+
The model was trained on the LibriSpeech ASR dataset for the English language and
|
18 |
+
Train epoch 13: `WER:30.41` `CER: 20.22`
|
19 |
+
|
20 |
+
|
21 |
+
Hubert Code TTS Example
|
22 |
+
```python
|
23 |
+
import asrp
|
24 |
+
import nlp2
|
25 |
+
import IPython.display as ipd
|
26 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
27 |
+
nlp2.download_file(
|
28 |
+
'https://dl.fbaipublicfiles.com/fairseq/speech_to_speech/vocoder/code_hifigan/mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj/g_00500000',
|
29 |
+
'./')
|
30 |
+
|
31 |
+
|
32 |
+
tokenizer = AutoTokenizer.from_pretrained("voidful/mhubert-unit-tts")
|
33 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("voidful/mhubert-unit-tts")
|
34 |
+
model.eval()
|
35 |
+
cs = asrp.Code2Speech(tts_checkpoint='./g_00500000', vocoder='hifigan')
|
36 |
+
|
37 |
+
inputs = tokenizer(["The quick brown fox jumps over the lazy dog."], return_tensors="pt")
|
38 |
+
code = tokenizer.batch_decode(model.generate(**inputs,max_length=1024))[0]
|
39 |
+
code = [int(i) for i in code.replace("</s>","").replace("<s>","").split("v_tok_")[1:]]
|
40 |
+
print(code)
|
41 |
+
ipd.Audio(data=cs(code), autoplay=False, rate=cs.sample_rate)
|
42 |
+
```
|
43 |
+
|
44 |
+
Datasets
|
45 |
+
The model was trained on the LibriSpeech ASR dataset for the English language.
|
46 |
+
|
47 |
+
Language
|
48 |
+
The model is trained for the English language.
|
49 |
+
|
50 |
+
Metrics
|
51 |
+
The model's performance is evaluated using Word Error Rate (WER).
|
52 |
+
|
53 |
+
Tags
|
54 |
+
The model can be tagged with "hubert" and "tts".
|