OuteAI
/

OuteTTS-0.2-500M-GGUF

Text-to-Speech

GGUF

Inference Endpoints

Model card Files Files and versions Community

edwko commited on 21 days ago

Commit

eea9a07

•

1 Parent(s): a36edf5

Update README.md

Browse files

Files changed (1) hide show

README.md +188 -3

README.md CHANGED Viewed

@@ -1,3 +1,188 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-4.0
+datasets:
+- facebook/multilingual_librispeech
+- parler-tts/libritts_r_filtered
+- amphion/Emilia-Dataset
+language:
+- en
+- zh
+- ja
+- ko
+pipeline_tag: text-to-speech
+---
+<style>
+table {
+    border-collapse: collapse;
+    width: 100%;
+    margin-bottom: 20px;
+}
+th, td {
+    border: 1px solid #ddd;
+    padding: 8px;
+    text-align: center;
+}
+.best {
+    font-weight: bold;
+    text-decoration: underline;
+}
+.box {
+  text-align: center;
+  margin: 20px auto;
+  padding: 30px;
+  box-shadow: 0px 0px 20px 10px rgba(0, 0, 0, 0.05), 0px 1px 3px 10px rgba(255, 255, 255, 0.05);
+  border-radius: 10px;
+}
+.badges {
+    display: flex;
+    justify-content: center;
+    gap: 10px;
+    flex-wrap: wrap;
+    margin-top: 10px;
+}
+.badge {
+    text-decoration: none;
+    display: inline-block;
+    padding: 4px 8px;
+    border-radius: 5px;
+    color: #fff;
+    font-size: 12px;
+    font-weight: bold;
+    width: 250px;
+}
+.badge-hf-blue {
+    background-color: #767b81;
+}
+.badge-hf-pink {
+    background-color: #7b768a;
+}
+.badge-github {
+    background-color: #2c2b2b;
+}
+</style>
+<div class="box">
+  <div style="margin-bottom: 20px;">
+    <h2 style="margin-bottom: 4px; margin-top: 0px;">OuteAI</h2>
+    <a href="https://www.outeai.com/" target="_blank" style="margin-right: 10px;">🌎 OuteAI.com</a>
+    <a href="https://discord.gg/vyBM87kAmf" target="_blank" style="margin-right: 10px;">🤝 Join our Discord</a>
+    <a href="https://x.com/OuteAI" target="_blank">𝕏 @OuteAI</a>
+  </div>
+  <div class="badges">
+    <a href="https://huggingface.co/OuteAI/OuteTTS-0.2-500M" target="_blank" class="badge badge-hf-blue">🤗 Hugging Face - OuteTTS 0.2 500M</a>
+    <a href="https://huggingface.co/OuteAI/OuteTTS-0.2-500M-GGUF" target="_blank" class="badge badge-hf-blue">🤗 Hugging Face - OuteTTS 0.2 500M GGUF</a>
+    <a href="https://huggingface.co/spaces/OuteAI/OuteTTS-0.2-500M-Demo" target="_blank" class="badge badge-hf-pink">🤗 Hugging Face - Demo Space</a>
+    <a href="https://github.com/edwko/OuteTTS" target="_blank" class="badge badge-github">GitHub - OuteTTS</a>
+  </div>
+</div>
+## Model Description
+OuteTTS-0.2-500M is our improved successor to the v0.1 release.
+The model maintains the same approach of using audio prompts without architectural changes to the foundation model itself.
+Built upon the Qwen-2.5-0.5B, this version was trained on larger and more diverse datasets, resulting in significant improvements across all aspects of performance.
+## Key Improvements
+- **Enhanced Accuracy**: Significantly improved prompt following and output coherence compared to the previous version
+- **Natural Speech**: Produces more natural and fluid speech synthesis
+- **Expanded Vocabulary**: Trained on over 5 billion audio prompt tokens
+- **Voice Cloning**: Improved voice cloning capabilities with greater diversity and accuracy
+- **Multilingual Support**: New experimental support for Chinese, Japanese, and Korean languages
+## Speech Demo
+<video width="1280" height="720" controls>
+  <source src="https://huggingface.co/OuteAI/OuteTTS-0.2-500M-GGUF/resolve/main/media/demo.mp4" type="video/mp4">
+Your browser does not support the video tag.
+</video>
+## Usage
+### Installation
+[![GitHub](https://img.shields.io/badge/GitHub-OuteTTS-181717?logo=github)](https://github.com/edwko/OuteTTS)
+```bash
+pip install outetts
+```
+### Interface Usage
+```python
+import outetts
+# Configure the model
+model_config = outetts.HFModelConfig_v1(
+    model_path="OuteAI/OuteTTS-0.2-500M",
+    language="en",  # Supported languages in v0.2: en, zh, ja, ko
+)
+# Initialize the interface
+interface = outetts.InterfaceHF(model_version="0.2", cfg=model_config)
+# Optional: Create a speaker profile (use a 10-15 second audio clip)
+# speaker = interface.create_speaker(
+#     audio_path="path/to/audio/file",
+#     transcript="Transcription of the audio file."
+# )
+# Optional: Save and load speaker profiles
+# interface.save_speaker(speaker, "speaker.pkl")
+# speaker = interface.load_speaker("speaker.pkl")
+# Optional: Load speaker from default presets
+interface.print_default_speakers()
+speaker = interface.load_default_speaker(name="male_1")
+output = interface.generate(
+    text="Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and it can be implemented in software or hardware products.",
+    # Lower temperature values may result in a more stable tone,
+    # while higher values can introduce varied and expressive speech
+    temperature=0.1,
+    repetition_penalty=1.1,
+    max_length=4096,
+    # Optional: Use a speaker profile for consistent voice characteristics
+    # Without a speaker profile, the model will generate a voice with random characteristics
+    speaker=speaker,
+)
+# Save the synthesized speech to a file
+output.save("output.wav")
+# Optional: Play the synthesized speech
+# output.play()
+```
+## Using GGUF Model
+```python
+# Configure the GGUF model
+model_config = outetts.GGUFModelConfig_v1(
+    model_path="local/path/to/model.gguf",
+    language="en", # Supported languages in v0.2: en, zh, ja, ko
+    n_gpu_layers=0,
+)
+# Initialize the GGUF interface
+interface = outetts.InterfaceGGUF(model_version="0.2", cfg=model_config)
+```
+## Model Specifications
+- **Base Model**: Qwen-2.5-0.5B
+- **Parameter Count**: 500M
+- **Language Support**:
+  - Primary: English
+  - Experimental: Chinese, Japanese, Korean
+- **License**: CC BY NC 4.0
+## Training Datasets
+- Emilia-Dataset (CC BY NC 4.0)
+- LibriTTS-R (CC BY 4.0)
+- Multilingual LibriSpeech (MLS) (CC BY 4.0)
+## Credits & References
+- [WavTokenizer](https://github.com/jishengpeng/WavTokenizer)
+- [CTC Forced Alignment](https://pytorch.org/audio/stable/tutorials/ctc_forced_alignment_api_tutorial.html)
+- [Qwen-2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)