File size: 3,384 Bytes
273316b 596ddfc e25c54d 596ddfc 596819f 596ddfc c8f97e6 596ddfc c8f97e6 596ddfc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
license: openrail++
---
<p align="center">
<img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/Instagram%20post%20-%204.png" alt="Tensen Logo" width="300" height="300"/>
</p>
---
<p align="center"><i>Democratizing access to LLMs, Multi-Modal Gen AI models for the open-source community.<br>Let's advance AI, together. </i></p>
---
Tansen is a text-to-speech program built with the following priorities:
1. Strong multi-voice capabilities.
2. Highly realistic prosody and intonation.
3. Speaking rate control
<a href="https://github.com/BudEcosystem/Tansen"><img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white" /> </a>
<h2 align="left">🎧 Demos </h2>
### Demos
[random_0_0.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/9a6ce191-2646-497e-bf48-003f2bf0bb8d)
[random_0_1.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/87bf5f7c-ae47-4aa4-a110-b5c9899e4446)
[random_0_2.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/5549c464-c670-4e7a-987c-c5d79b32bf4b)
<h2 align="left">💻 Getting Started on GitHub </h2>
Ready to dive in? Here's how you can get started with our repo on GitHub.
<h3 align="left">1️⃣ : Clone our GitHub repository</h3>
First things first, you'll need to clone our repository. Open up your terminal, navigate to the directory where you want the repository to be cloned, and run the following command:
```bash
conda create --name Tansen python=3.9 numba inflect
conda activate Tansen
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install transformers=4.29.2
git clone https://github.com/BudEcosystem/Tansen.git
cd Tansen
```
<h3 align="left">2️⃣ : Install dependencies</h3>
```bash
python setup.py install
```
<h3 align="left">3️⃣ : Generate Audio</h3>
### do_tts.py
This script allows you to speak a single phrase with one or more voices.
```shell
python do_tts.py --text "I'm going to speak this" --voice random --preset fast
```
### read.py
This script provides tools for reading large amounts of text.
```shell
python Tansen/read.py --textfile <your text to be read> --voice random
```
This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series
of spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and
output that as well.
Sometimes Tansen screws up an output. You can re-generate any bad clips by re-running `read.py` with the --regenerate
argument.
Intrested in running as as API ?
### 🐍 Usage in Python
Tansen can be used programmatically :
```python
reference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]
tts = api.TextToSpeech(use_deepspeed=True, kv_cache=True, half=True)
pcm_audio = tts.tts_with_preset("your text here", voice_samples=reference_clips, preset='fast')
```
## Loss Curves
<p align="center">
<img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/results/images/loss_mel_ce.png" alt="" width="500"/>
<span>loss_mel_ce</span>
<p>
<p align="center">
<img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/results/images/loss_text_ce.png" alt="" width="500" />
<span>loss_text_ce</span>
<p>
## Training Information
Device : A Single A100
Dataset : 876 hours |