GLM-4-Voice-Tokenizer

GLM-4-Voice 是智谱 AI 推出的端到端语音模型。GLM-4-Voice 能够直接理解和生成中英文语音，进行实时语音对话，并且能够根据用户的指令改变语音的情感、语调、语速、方言等属性。

GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions.

本仓库是 GLM-4-Voice 的 speech tokenizer 部分。通过在 Whisper 的 encoder 部分增加 vector quantization 进行训练，将连续的语音输入转化为离散的 token。每秒音频转化为 12.5 个离散 token。

The repo provides the speech tokenzier of GLM-4-Voice, which is trained by adding vector quantization to the encoder part of Whisper and converts continuous speech input into discrete tokens. Each second of audio is converted into 12.5 discrete tokens.

更多信息请参考我们的仓库 GLM-4-Voice.

For more information please refer to our repo GLM-4-Voice.