fnlp
/

ZhangXInFD commited on
Commit
4d54939
·
1 Parent(s): 793c3ac

Update REDME.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -3,7 +3,7 @@
3
  <a href='https://github.com/ZhangXInFD/SpeechTokenizer'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2308.16692'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
4
 
5
  ## Introduction
6
- This is the code for the SpeechTokenizer presented in the [SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models](https://0nutation.github.io/SpeechTokenizer.github.io/). SpeechTokenizer is a unified speech tokenizer for speech large language models, which adopts the Encoder-Decoder architecture with residual vector quantization (RVQ). Unifying semantic and acoustic tokens, SpeechTokenizer disentangles different aspects of speech information hierarchically across different RVQ layers. Specifically, The code indices that the first quantizer of RVQ outputs can be considered as semantic tokens and the output of the remaining quantizers can be regarded as acoustic tokens, which serve as supplements for the information lost by the first quantizer. We provide our models:
7
  * A model operated at 16khz on monophonic speech trained on Librispeech with average representation across all HuBERT layers as semantic teacher.
8
 
9
  <br>
@@ -41,7 +41,10 @@ pip install .
41
  ```
42
  ## Usage
43
  ### Model storage
44
- [model list](https://huggingface.co/fnlp/SpeechTokenizer)
 
 
 
45
  ### load model
46
  ```python
47
  from speechtokenizer import SpeechTokenizer
 
3
  <a href='https://github.com/ZhangXInFD/SpeechTokenizer'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='https://arxiv.org/abs/2308.16692'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
4
 
5
  ## Introduction
6
+ This is the code for the SpeechTokenizer presented in the [SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models](https://arxiv.org/abs/2308.16692). SpeechTokenizer is a unified speech tokenizer for speech large language models, which adopts the Encoder-Decoder architecture with residual vector quantization (RVQ). Unifying semantic and acoustic tokens, SpeechTokenizer disentangles different aspects of speech information hierarchically across different RVQ layers. Specifically, The code indices that the first quantizer of RVQ outputs can be considered as semantic tokens and the output of the remaining quantizers can be regarded as acoustic tokens, which serve as supplements for the information lost by the first quantizer. We provide our models:
7
  * A model operated at 16khz on monophonic speech trained on Librispeech with average representation across all HuBERT layers as semantic teacher.
8
 
9
  <br>
 
41
  ```
42
  ## Usage
43
  ### Model storage
44
+ | Model |Discription|
45
+ |:----|:----|
46
+ |[speechtokenizer_hubert_avg](https://huggingface.co/fnlp/SpeechTokenizer/tree/main/speechtokenizer_hubert_avg)|Adopt average representation across all HuBERT layers as semantic teacher |
47
+
48
  ### load model
49
  ```python
50
  from speechtokenizer import SpeechTokenizer