fixie-ai
/

ultravox-v0_2

Feature Extraction

Model card Files Files and versions Community

farzadab commited on Jul 9

Commit

8115677

•

1 Parent(s): f29c0a8

Update README.md

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -12,7 +12,8 @@ datasets:
 # Model Card for Ultravox
-Ultravox is a multimodal Speech LLM built around a pretrained [Llama3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [Whisper-small](https://huggingface.co/openai/whisper-small) backbone. See https://ultravox.ai for the GitHub repo and more information.
 ## Model Details
@@ -29,10 +30,10 @@ No preference tuning has been applied to this revision of the model.
 - **Developed by:** Fixie.ai
 - **License:** MIT
-### Model Sources [optional]
 - **Repository:** https://ultravox.ai
-- **Demo [optional]:** See repo
 ## Uses
@@ -49,7 +50,6 @@ The multi-modal projector is first trained (while keeping backbones frozen) in s
 Training dataset is a mix of ASR datasets (Gigaspeech), instruction-following and QA data (AnyInstruct and an extended version of BoolQ), and conversational data (SODA with alternative generations for last two turns).
-[More Information Needed]
 ### Training Procedure
@@ -59,13 +59,14 @@ Supervised speech to audio finetuning. For more info, see [training code in Ultr
 #### Training Hyperparameters
 - **Training regime:** BF16 mixed precision training
 - **LLM LoRA Rank:** 64
-#### Speeds, Sizes, Times [optional]
 The current version of Ultravox, when invoked with audio content, has a time-to-first-token (TTFT) of approximately 200ms, and a tokens-per-second rate of ~50-100 when using an A100-40GB GPU, all using a Llama 3 8B backbone.
-Check out the audio tab on [thefastest.ai](https://thefastest.ai/?m=audio) for daily benchmarks and a comparison with other existing models.
 ## Evaluation

 # Model Card for Ultravox
+Ultravox is a multimodal Speech LLM built around a pretrained [Llama3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [Whisper-small](https://huggingface.co/openai/whisper-small) backbone.\
+See https://ultravox.ai for the GitHub repo and more information.
 ## Model Details
 - **Developed by:** Fixie.ai
 - **License:** MIT
+### Model Sources
 - **Repository:** https://ultravox.ai
+- **Demo:** See repo
 ## Uses
 Training dataset is a mix of ASR datasets (Gigaspeech), instruction-following and QA data (AnyInstruct and an extended version of BoolQ), and conversational data (SODA with alternative generations for last two turns).
 ### Training Procedure
 #### Training Hyperparameters
 - **Training regime:** BF16 mixed precision training
+- **Hardward used:** 8x A100-40GB GPUs
 - **LLM LoRA Rank:** 64
+#### Speeds, Sizes, Times
 The current version of Ultravox, when invoked with audio content, has a time-to-first-token (TTFT) of approximately 200ms, and a tokens-per-second rate of ~50-100 when using an A100-40GB GPU, all using a Llama 3 8B backbone.
+Check out the audio tab on [TheFastest.ai](https://thefastest.ai/?m=audio) for daily benchmarks and a comparison with other existing models.
 ## Evaluation