File size: 4,861 Bytes
69c1c0a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
license: apache-2.0
base_model:
- coqui/XTTS-v2
---
# Auralis 🌌

## Model Details πŸ› οΈ

**Model Name:** Auralis  

**Model Architecture:** Based on [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) 

**License:**  
- license: Apache 2.0  
- base_model: XTTS-v2 Components [Coqui AI License](https://coqui.ai/cpml)

**Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi
  
**Developed by:** [AstraMind.ai](https://www.astramind.ai)
  
**GitHub:** [AstraMind AI](https://github.com/astramind-ai/Auralis/tree/main)

**Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks.  

---

## Model Description πŸš€

Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling.

### Key Features:
- **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes.  
- **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090.  
- **Scalable:** Handles multiple requests simultaneously.  
- **Streaming:** Seamlessly processes long texts in a streaming format.  
- **Custom Voices:** Enables voice cloning from short reference audio.  

---

## Quick Start ⭐

```python
from auralis import TTS, TTSRequest

# Initialize the model
tts = TTS().from_pretrained("AstraMindAI/xtts2-gpt")

# Create a TTS request
request = TTSRequest(
    text="Hello Earth! This is Auralis speaking.",
    speaker_files=["reference.wav"]
)

# Generate speech
output = tts.generate_speech(request)
output.save("output.wav")
```

---

## Ebook Generation πŸ“š

Auralis converting ebooks into audio formats at lightning speed. For Python script, check out [ebook_audio_generator.py](https://github.com/astramind-ai/Auralis/blob/main/examples/vocalize_a_ebook.py).

```python
def process_book(chapter_file: str, speaker_file: str):
    # Read chapter
    with open(chapter_file, 'r') as f:
        chapter = f.read()
    
    # You can pass the whole book, auralis will take care of splitting
    
    request = TTSRequest(
            text=chapter,
            speaker_files=[speaker_file],
            audio_config=AudioPreprocessingConfig(
                enhance_speech=True,
                normalize=True
            )
        )
        
    output = tts.generate_speech(request)
    
    output.play()
    output.save("chapter_output.wav")

# Example usage
process_book("chapter1.txt", "reference_voice.wav")
```

---

## Intended Use 🌟

Auralis is designed for:
- **Content Creators:** Generate audiobooks, podcasts, or voiceovers.  
- **Developers:** Integrate TTS into applications via a simple Python API.  
- **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. 
- **Multilingual Scenarios:** Convert text to speech in multiple supported languages.  

---

## Performance πŸ“Š

**Benchmarks on NVIDIA RTX 3090:**  
- Short phrases (<100 characters): ~1 second  
- Medium texts (<1,000 characters): ~5-10 seconds  
- Full books (~100,000 characters): ~10 minutes  

**Memory Usage:**  
- Base VRAM: ~4GB  
- Peak VRAM: ~10GB  

---

## Model Features πŸ›Έ

1. **Speed & Efficiency:**  
   - Smart batching for rapid processing of long texts.  
   - Memory-optimized for consumer GPUs.  

2. **Easy Integration:**  
   - Python API with support for synchronous and asynchronous workflows.  
   - Streaming mode for continuous playback during generation.  

3. **Audio Quality Enhancements:**  
   - Background noise reduction.  
   - Voice clarity and volume normalization.  
   - Customizable audio preprocessing.  

4. **Multilingual Support:**  
   - Automatic language detection.  
   - High-quality speech in 15+ languages.  

5. **Customization:**  
   - Voice cloning using short reference clips.  
   - Adjustable parameters for tone, pacing, and language.  

---

## Limitations & Ethical Considerations ⚠️

- **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent.  
- **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input.  

---

## Citation πŸ“œ

If you use Auralis in your research or projects, please cite:

```bibtex
@misc{auralis2024,
  author = {AstraMind AI},
  title = {Auralis: High-Performance Text-to-Speech Engine},
  year = {2024},
  url = {https://huggingface.co/AstraMindAI/auralis}
}
```