--- license: cc-by-4.0 ---
# OuteTTS-0.1-350M ## Model Description OuteTTS-0.1-350M is a novel text-to-speech synthesis model that leverages pure language modeling without external adapters or complex architectures, built upon the LLaMa architecture using our Oute3-350M-DEV base model, it demonstrates that high-quality speech synthesis is achievable through a straightforward approach using crafted prompts and audio tokens. ## Key Features - Pure language modeling approach to TTS - Voice cloning capabilities - LLaMa architecture - Compatible with llama.cpp and GGUF format ## Technical Details The model utilizes a three-step approach to audio processing: 1. Audio tokenization using WavTokenizer (processing 75 tokens per second) 2. CTC forced alignment for precise word-to-audio token mapping 3. Structured prompt creation following the format: ``` [full transcription] [word] [duration token] [audio tokens] ``` ## Technical Blog https://www.outeai.com/blog/OuteTTS-0.1-350M ## Limitations Being an experimental v0.1 release, there are some known issues: - Vocabulary constraints due to training data limitations - String-only input support - Given its compact 350M parameter size, the model may frequently alter, insert, or omit wrong words, leading to variations in output quality. - Variable temperature sensitivity depending on use case - Performs best with shorter sentences, as accuracy may decrease with longer inputs ### Speech Samples Listen to samples generated by OuteTTS-0.1-350M:Input | Audio | Notes |
---|---|---|
Hello, I can speak pretty well, but sometimes I make some mistakes. | (temperature=0.1, repetition_penalty=1.1) | |
Once upon a time, there was a | (temperature=0.1, repetition_penalty=1.1) | |
Scientists have discovered a new planet that may be capable of supporting life! | The model partially failed to follow the input text. (temperature=0.1, repetition_penalty=1.1) | |
Scientists have discovered a new planet that may be capable of supporting life! | In this case, changing to a higher temperature from 0.1 to 0.7 produces more consistent output. (temperature=0.7, repetition_penalty=1.1) |