romitjain commited on
Commit
d3e8b89
·
1 Parent(s): 0abccce

Updated README

Browse files
README.md CHANGED
@@ -17,6 +17,14 @@ pipeline_tag: text-to-speech
17
  library_name: transformers
18
  ---
19
 
 
 
 
 
 
 
 
 
20
  # Model Card for indri-0.1-124m-tts
21
 
22
  Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model (124M) in our series and supports TTS tasks in 2 languages:
@@ -24,12 +32,6 @@ Indri is a series of audio models that can do TTS, ASR, and audio continuation.
24
  1. English
25
  2. Hindi
26
 
27
- We have open-sourced our training scripts, inference, and other details.
28
-
29
- - **Repository:** [GitHub](https://github.com/cmeraki/indri)
30
- - **Demo:** [Website](https://www.indrivoice.ai/)
31
- - **Implementation details**: [Release Blog](#TODO)
32
-
33
  ## Model Details
34
 
35
  ### Model Description
@@ -37,9 +39,20 @@ We have open-sourced our training scripts, inference, and other details.
37
  `indri-0.1-124m-tts` is a novel, ultra-small, and lightweight TTS model based on the transformer architecture.
38
  It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.
39
 
 
 
 
 
 
 
 
 
 
 
 
40
  ### Key features
41
 
42
- 1. Based on GPT-2 architecture. The methodology can be extended to any transformer-based architecture.
43
  2. Supports voice cloning with small prompts (<5s).
44
  3. Code mixing text input in 2 languages - English and Hindi.
45
  4. Ultra-fast. Can generate 5 seconds of audio per second on Amphere generation NVIDIA GPUs, and up to 10 seconds of audio per second on Ada generation NVIDIA GPUs.
@@ -51,6 +64,10 @@ It models audio as tokens and can generate high-quality audio with consistent st
51
  3. Language Support: English, Hindi
52
  4. License: CC BY 4.0
53
 
 
 
 
 
54
  ## Technical details
55
 
56
  Here's a brief of how the model works:
@@ -63,6 +80,7 @@ Please read our blog [here](#TODO) for more technical details on how it was buil
63
 
64
  ## How to Get Started with the Model
65
 
 
66
  Use the code below to get started with the model. Pipelines are the best way to get started with the model.
67
 
68
  ```python
@@ -85,6 +103,21 @@ output = pipe(['Hi, my name is Indri and I like to talk.'])
85
  torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
86
  ```
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  ## Citation
89
 
90
  If you use this model in your research, please cite:
 
17
  library_name: transformers
18
  ---
19
 
20
+ | Platform | Link |
21
+ |----------|------|
22
+ | 🌎 Live Demo | [indrivoice.ai](https://indrivoice.ai/) |
23
+ | 𝕏 Twitter | [@11mlabs](https://x.com/11mlabs) |
24
+ | 🐱 GitHub | [Indri Repository](https://github.com/cmeraki/indri) |
25
+ | 🤗 Hugging Face (Collection) | [Indri collection](https://huggingface.co/collections/11mlabs/indri-673dd4210b4369037c736bfe) |
26
+ | 📝 Release Blog | [Release Blog](#) |
27
+
28
  # Model Card for indri-0.1-124m-tts
29
 
30
  Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model (124M) in our series and supports TTS tasks in 2 languages:
 
32
  1. English
33
  2. Hindi
34
 
 
 
 
 
 
 
35
  ## Model Details
36
 
37
  ### Model Description
 
39
  `indri-0.1-124m-tts` is a novel, ultra-small, and lightweight TTS model based on the transformer architecture.
40
  It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.
41
 
42
+ ### Samples
43
+
44
+ | Text | Sample |
45
+ | --- | --- |
46
+ |अतीत गौरवशाली, वर्तमान आशावादी, भविष्य उज्जवल| <audio controls src="data/417f5f1b-d641-4393-b922-9da9644dcd1b.wav" title="Title"></audio> |
47
+ |भाइयों और बहनों, ये हमारा सौभाग्य है कि हम सब मिलकर इस महान देश को नई ऊंचाइयों पर ले जाने का सपना देख रहे हैं।| <audio controls src="data/6e0a4879-0379-4166-a52c-03220a3f2922.wav" title="Title"></audio> |
48
+ |Hello दोस्तों, future of speech technology mein अपका स्वागत है | <audio controls src="data/5848b722-efe3-4e1f-a15e-5e7d431cd475.wav" title="Title"></audio> |
49
+ |Artificial Intelligence's collaborative hub: Transforming Machine Learning together| <audio controls src="data/12e5a00e-834b-4c3c-a8b8-7f545ba7088c.wav" title="Title"></audio> |
50
+ |Intelligent machines processing data at lightning-fast electronic speeds| <audio controls src="data/e21efa09-e179-42b7-982a-b686038a8f60.wav" title="Title"></audio> |
51
+
52
+
53
  ### Key features
54
 
55
+ 1. Extremely small, based on GPT-2 small architecture. The methodology can be extended to any autoregressive transformer-based architecture.
56
  2. Supports voice cloning with small prompts (<5s).
57
  3. Code mixing text input in 2 languages - English and Hindi.
58
  4. Ultra-fast. Can generate 5 seconds of audio per second on Amphere generation NVIDIA GPUs, and up to 10 seconds of audio per second on Ada generation NVIDIA GPUs.
 
64
  3. Language Support: English, Hindi
65
  4. License: CC BY 4.0
66
 
67
+ ### Speed
68
+
69
+
70
+
71
  ## Technical details
72
 
73
  Here's a brief of how the model works:
 
80
 
81
  ## How to Get Started with the Model
82
 
83
+ ### 🤗 pipelines
84
  Use the code below to get started with the model. Pipelines are the best way to get started with the model.
85
 
86
  ```python
 
103
  torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
104
  ```
105
 
106
+ ### Self hosted service
107
+
108
+ ```bash
109
+ git clone https://github.com/cmeraki/indri.git
110
+ cd indri
111
+ pip install -r requirements.txt
112
+
113
+ # Install ffmpeg (for Mac/Windows, refer here: https://www.ffmpeg.org/download.html)
114
+ sudo apt update -y
115
+ sudo apt upgrade -y
116
+ sudo apt install ffmpeg -y
117
+
118
+ python -m inference --model_path 11mlabs/indri-0.1-124m-tts --device cuda:0 --port 8000
119
+ ```
120
+
121
  ## Citation
122
 
123
  If you use this model in your research, please cite:
data/12e5a00e-834b-4c3c-a8b8-7f545ba7088c.wav ADDED
Binary file (41.7 kB). View file
 
data/417f5f1b-d641-4393-b922-9da9644dcd1b.wav ADDED
Binary file (39 kB). View file
 
data/5848b722-efe3-4e1f-a15e-5e7d431cd475.wav ADDED
Binary file (32.7 kB). View file
 
data/6e0a4879-0379-4166-a52c-03220a3f2922.wav ADDED
Binary file (69.2 kB). View file
 
data/e21efa09-e179-42b7-982a-b686038a8f60.wav ADDED
Binary file (45.5 kB). View file