Edit model card

Vikhr Salt: Speech And Language Transformer

Vikhr Salt Logo

Vikhr Salt is a multimodal model based on a pre-trained large language model, extended with new audio tokens to handle both TTS (text-to-speech) and ASR (automatic speech recognition) tasks. The model incorporates two variants for encoding audio—Encodec and SpeechTokenizer—and achieves stable training by fine-tuning precision settings. This approach allows Vikhr Salt to leverage pre-existing LLM knowledge while effectively generating and understanding speech, marking a step forward in multimodal learning.

Model Authors

Ksenya Sycheva, Konstantin Korolev, Aleksandr Nikolic

Downloads last month
36
Safetensors
Model size
1.1B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Vikhrmodels/salt-116k

Finetuned
(12)
this model

Space using Vikhrmodels/salt-116k 1