Vikhr Salt: Speech And Language Transformer
Vikhr Salt is a multimodal model based on a pre-trained large language model, extended with new audio tokens to handle both TTS (text-to-speech) and ASR (automatic speech recognition) tasks. The model incorporates two variants for encoding audio—Encodec and SpeechTokenizer—and achieves stable training by fine-tuning precision settings. This approach allows Vikhr Salt to leverage pre-existing LLM knowledge while effectively generating and understanding speech, marking a step forward in multimodal learning.
Model Authors
Ksenya Sycheva, Konstantin Korolev, Aleksandr Nikolic
- Downloads last month
- 36
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Vikhrmodels/salt-116k
Base model
TinyLlama/TinyLlama_v1.1