File size: 4,861 Bytes
69c1c0a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
license: apache-2.0
base_model:
- coqui/XTTS-v2
---
# Auralis π
## Model Details π οΈ
**Model Name:** Auralis
**Model Architecture:** Based on [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2)
**License:**
- license: Apache 2.0
- base_model: XTTS-v2 Components [Coqui AI License](https://coqui.ai/cpml)
**Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi
**Developed by:** [AstraMind.ai](https://www.astramind.ai)
**GitHub:** [AstraMind AI](https://github.com/astramind-ai/Auralis/tree/main)
**Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks.
---
## Model Description π
Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling.
### Key Features:
- **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes.
- **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090.
- **Scalable:** Handles multiple requests simultaneously.
- **Streaming:** Seamlessly processes long texts in a streaming format.
- **Custom Voices:** Enables voice cloning from short reference audio.
---
## Quick Start β
```python
from auralis import TTS, TTSRequest
# Initialize the model
tts = TTS().from_pretrained("AstraMindAI/xtts2-gpt")
# Create a TTS request
request = TTSRequest(
text="Hello Earth! This is Auralis speaking.",
speaker_files=["reference.wav"]
)
# Generate speech
output = tts.generate_speech(request)
output.save("output.wav")
```
---
## Ebook Generation π
Auralis converting ebooks into audio formats at lightning speed. For Python script, check out [ebook_audio_generator.py](https://github.com/astramind-ai/Auralis/blob/main/examples/vocalize_a_ebook.py).
```python
def process_book(chapter_file: str, speaker_file: str):
# Read chapter
with open(chapter_file, 'r') as f:
chapter = f.read()
# You can pass the whole book, auralis will take care of splitting
request = TTSRequest(
text=chapter,
speaker_files=[speaker_file],
audio_config=AudioPreprocessingConfig(
enhance_speech=True,
normalize=True
)
)
output = tts.generate_speech(request)
output.play()
output.save("chapter_output.wav")
# Example usage
process_book("chapter1.txt", "reference_voice.wav")
```
---
## Intended Use π
Auralis is designed for:
- **Content Creators:** Generate audiobooks, podcasts, or voiceovers.
- **Developers:** Integrate TTS into applications via a simple Python API.
- **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties.
- **Multilingual Scenarios:** Convert text to speech in multiple supported languages.
---
## Performance π
**Benchmarks on NVIDIA RTX 3090:**
- Short phrases (<100 characters): ~1 second
- Medium texts (<1,000 characters): ~5-10 seconds
- Full books (~100,000 characters): ~10 minutes
**Memory Usage:**
- Base VRAM: ~4GB
- Peak VRAM: ~10GB
---
## Model Features πΈ
1. **Speed & Efficiency:**
- Smart batching for rapid processing of long texts.
- Memory-optimized for consumer GPUs.
2. **Easy Integration:**
- Python API with support for synchronous and asynchronous workflows.
- Streaming mode for continuous playback during generation.
3. **Audio Quality Enhancements:**
- Background noise reduction.
- Voice clarity and volume normalization.
- Customizable audio preprocessing.
4. **Multilingual Support:**
- Automatic language detection.
- High-quality speech in 15+ languages.
5. **Customization:**
- Voice cloning using short reference clips.
- Adjustable parameters for tone, pacing, and language.
---
## Limitations & Ethical Considerations β οΈ
- **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent.
- **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input.
---
## Citation π
If you use Auralis in your research or projects, please cite:
```bibtex
@misc{auralis2024,
author = {AstraMind AI},
title = {Auralis: High-Performance Text-to-Speech Engine},
year = {2024},
url = {https://huggingface.co/AstraMindAI/auralis}
}
``` |