File size: 3,384 Bytes
273316b
 
 
596ddfc
 
 
 
 
 
 
e25c54d
596ddfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
596819f
 
 
596ddfc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8f97e6
596ddfc
 
 
 
c8f97e6
596ddfc
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: openrail++
---






<p align="center">
  <img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/Instagram%20post%20-%204.png" alt="Tensen Logo" width="300" height="300"/>
</p>

---

<p align="center"><i>Democratizing access to LLMs, Multi-Modal Gen AI models for the open-source community.<br>Let's advance AI, together. </i></p>

---


Tansen is a text-to-speech program built with the following priorities:

1. Strong multi-voice capabilities.
2. Highly realistic prosody and intonation.
3. Speaking rate control


<a href="https://github.com/BudEcosystem/Tansen"><img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white" /> </a>


<h2 align="left">🎧 Demos </h2>



### Demos

[random_0_0.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/9a6ce191-2646-497e-bf48-003f2bf0bb8d)

[random_0_1.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/87bf5f7c-ae47-4aa4-a110-b5c9899e4446)

[random_0_2.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/5549c464-c670-4e7a-987c-c5d79b32bf4b)

<h2 align="left">💻 Getting Started on GitHub </h2>

Ready to dive in? Here's how you can get started with our repo on GitHub.

<h3 align="left">1️⃣ : Clone our GitHub repository</h3>

First things first, you'll need to clone our repository. Open up your terminal, navigate to the directory where you want the repository to be cloned, and run the following command:

```bash
conda create --name Tansen python=3.9 numba inflect
conda activate Tansen
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install transformers=4.29.2
git clone https://github.com/BudEcosystem/Tansen.git
cd Tansen
```

<h3 align="left">2️⃣ : Install dependencies</h3>

```bash
python setup.py install
```

<h3 align="left">3️⃣ : Generate Audio</h3>

### do_tts.py

This script allows you to speak a single phrase with one or more voices.

```shell
python do_tts.py --text "I'm going to speak this" --voice random --preset fast
```

### read.py

This script provides tools for reading large amounts of text.

```shell
python Tansen/read.py --textfile <your text to be read> --voice random
```

This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series
of spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and
output that as well.

Sometimes Tansen screws up an output. You can re-generate any bad clips by re-running `read.py` with the --regenerate
argument.

Intrested in running as as API ?

### 🐍 Usage in Python

Tansen can be used programmatically :

```python
reference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]
tts = api.TextToSpeech(use_deepspeed=True, kv_cache=True, half=True)
pcm_audio = tts.tts_with_preset("your text here", voice_samples=reference_clips, preset='fast')
```

## Loss Curves

<p align="center">
 <img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/results/images/loss_mel_ce.png" alt="" width="500"/>
 <span>loss_mel_ce</span>
<p>

<p align="center">
 <img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/results/images/loss_text_ce.png" alt="" width="500" />
 <span>loss_text_ce</span>
<p>


## Training Information

Device : A Single A100

Dataset : 876 hours