farzadab commited on
Commit
8115677
1 Parent(s): f29c0a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -12,7 +12,8 @@ datasets:
12
 
13
  # Model Card for Ultravox
14
 
15
- Ultravox is a multimodal Speech LLM built around a pretrained [Llama3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [Whisper-small](https://huggingface.co/openai/whisper-small) backbone. See https://ultravox.ai for the GitHub repo and more information.
 
16
 
17
 
18
  ## Model Details
@@ -29,10 +30,10 @@ No preference tuning has been applied to this revision of the model.
29
  - **Developed by:** Fixie.ai
30
  - **License:** MIT
31
 
32
- ### Model Sources [optional]
33
 
34
  - **Repository:** https://ultravox.ai
35
- - **Demo [optional]:** See repo
36
 
37
  ## Uses
38
 
@@ -49,7 +50,6 @@ The multi-modal projector is first trained (while keeping backbones frozen) in s
49
 
50
  Training dataset is a mix of ASR datasets (Gigaspeech), instruction-following and QA data (AnyInstruct and an extended version of BoolQ), and conversational data (SODA with alternative generations for last two turns).
51
 
52
- [More Information Needed]
53
 
54
  ### Training Procedure
55
 
@@ -59,13 +59,14 @@ Supervised speech to audio finetuning. For more info, see [training code in Ultr
59
  #### Training Hyperparameters
60
 
61
  - **Training regime:** BF16 mixed precision training
 
62
  - **LLM LoRA Rank:** 64
63
 
64
- #### Speeds, Sizes, Times [optional]
65
 
66
  The current version of Ultravox, when invoked with audio content, has a time-to-first-token (TTFT) of approximately 200ms, and a tokens-per-second rate of ~50-100 when using an A100-40GB GPU, all using a Llama 3 8B backbone.
67
 
68
- Check out the audio tab on [thefastest.ai](https://thefastest.ai/?m=audio) for daily benchmarks and a comparison with other existing models.
69
 
70
  ## Evaluation
71
 
 
12
 
13
  # Model Card for Ultravox
14
 
15
+ Ultravox is a multimodal Speech LLM built around a pretrained [Llama3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) and [Whisper-small](https://huggingface.co/openai/whisper-small) backbone.\
16
+ See https://ultravox.ai for the GitHub repo and more information.
17
 
18
 
19
  ## Model Details
 
30
  - **Developed by:** Fixie.ai
31
  - **License:** MIT
32
 
33
+ ### Model Sources
34
 
35
  - **Repository:** https://ultravox.ai
36
+ - **Demo:** See repo
37
 
38
  ## Uses
39
 
 
50
 
51
  Training dataset is a mix of ASR datasets (Gigaspeech), instruction-following and QA data (AnyInstruct and an extended version of BoolQ), and conversational data (SODA with alternative generations for last two turns).
52
 
 
53
 
54
  ### Training Procedure
55
 
 
59
  #### Training Hyperparameters
60
 
61
  - **Training regime:** BF16 mixed precision training
62
+ - **Hardward used:** 8x A100-40GB GPUs
63
  - **LLM LoRA Rank:** 64
64
 
65
+ #### Speeds, Sizes, Times
66
 
67
  The current version of Ultravox, when invoked with audio content, has a time-to-first-token (TTFT) of approximately 200ms, and a tokens-per-second rate of ~50-100 when using an A100-40GB GPU, all using a Llama 3 8B backbone.
68
 
69
+ Check out the audio tab on [TheFastest.ai](https://thefastest.ai/?m=audio) for daily benchmarks and a comparison with other existing models.
70
 
71
  ## Evaluation
72