11mlabs
/

indri-0.1-124m-tts

text-generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rom7 commited on Nov 21

Commit

114d7e4

•

1 Parent(s): 4a5b008

Update README.md

Files changed (1) hide show

README.md +4 -7

README.md CHANGED Viewed

@@ -53,9 +53,10 @@ It models audio as tokens and can generate high-quality audio with consistent st
 ### Key features
 1. Extremely small, based on GPT-2 small architecture. The methodology can be extended to any autoregressive transformer-based architecture.
-2. Supports voice cloning with small prompts (<5s).
-3. Code mixing text input in 2 languages - English and Hindi.
-4. Ultra-fast. Can generate 5 seconds of audio per second on Amphere generation NVIDIA GPUs, and up to 10 seconds of audio per second on Ada generation NVIDIA GPUs.
 ### Details
@@ -64,10 +65,6 @@ It models audio as tokens and can generate high-quality audio with consistent st
 3. Language Support: English, Hindi
 4. License: CC BY 4.0
-### Speed
 ## Technical details
 Here's a brief of how the model works:

 ### Key features
 1. Extremely small, based on GPT-2 small architecture. The methodology can be extended to any autoregressive transformer-based architecture.
+2. Ultra-fast. Using our [self hosted service option](#self-hosted-service), the model can achieve speeds up to 400 toks/s (4s of audio generation per s) and under 20ms time to first token on RTX6000Ada NVIDIA GPU.
+  1. On RTX6000Ada, it can support a batch size of 1k with full context length of 1024 tokens
+3. Supports voice cloning with small prompts (<5s).
+4. Code mixing text input in 2 languages - English and Hindi.
 ### Details
 3. Language Support: English, Hindi
 4. License: CC BY 4.0
 ## Technical details
 Here's a brief of how the model works: