ZhangXInFD
commited on
Commit
·
4d54939
1
Parent(s):
793c3ac
Update REDME.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@
|
|
3 |
<a href='https://github.com/ZhangXInFD/SpeechTokenizer'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2308.16692'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
|
4 |
|
5 |
## Introduction
|
6 |
-
This is the code for the SpeechTokenizer presented in the [SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models](https://
|
7 |
* A model operated at 16khz on monophonic speech trained on Librispeech with average representation across all HuBERT layers as semantic teacher.
|
8 |
|
9 |
<br>
|
@@ -41,7 +41,10 @@ pip install .
|
|
41 |
```
|
42 |
## Usage
|
43 |
### Model storage
|
44 |
-
|
|
|
|
|
|
|
45 |
### load model
|
46 |
```python
|
47 |
from speechtokenizer import SpeechTokenizer
|
|
|
3 |
<a href='https://github.com/ZhangXInFD/SpeechTokenizer'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2308.16692'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
|
4 |
|
5 |
## Introduction
|
6 |
+
This is the code for the SpeechTokenizer presented in the [SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models](https://arxiv.org/abs/2308.16692). SpeechTokenizer is a unified speech tokenizer for speech large language models, which adopts the Encoder-Decoder architecture with residual vector quantization (RVQ). Unifying semantic and acoustic tokens, SpeechTokenizer disentangles different aspects of speech information hierarchically across different RVQ layers. Specifically, The code indices that the first quantizer of RVQ outputs can be considered as semantic tokens and the output of the remaining quantizers can be regarded as acoustic tokens, which serve as supplements for the information lost by the first quantizer. We provide our models:
|
7 |
* A model operated at 16khz on monophonic speech trained on Librispeech with average representation across all HuBERT layers as semantic teacher.
|
8 |
|
9 |
<br>
|
|
|
41 |
```
|
42 |
## Usage
|
43 |
### Model storage
|
44 |
+
| Model |Discription|
|
45 |
+
|:----|:----|
|
46 |
+
|[speechtokenizer_hubert_avg](https://huggingface.co/fnlp/SpeechTokenizer/tree/main/speechtokenizer_hubert_avg)|Adopt average representation across all HuBERT layers as semantic teacher |
|
47 |
+
|
48 |
### load model
|
49 |
```python
|
50 |
from speechtokenizer import SpeechTokenizer
|