Diffusers
AudioLDMPipeline
sanchit-gandhi commited on
Commit
d3a65c3
·
1 Parent(s): 18acc5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -6,7 +6,7 @@
6
 
7
  # AudioLDM
8
 
9
- AudioLDM is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.14.0 onwards.
10
 
11
  # Model Details
12
 
@@ -29,7 +29,7 @@ sound effects, human speech and music.
29
  First, install the required packages:
30
 
31
  ```
32
- pip install --upgrade git+https://github.com/huggingface/diffusers git+https://github.com/huggingface/transformers scipy
33
  ```
34
 
35
  ## Text-to-Audio
@@ -46,7 +46,7 @@ pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
46
  pipe = pipe.to("cuda")
47
 
48
  prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
49
- audio = pipe(prompt, num_inference_steps=10, height=512).audios[0]
50
  ```
51
 
52
  The resulting audio output can be saved as a .wav file:
@@ -65,10 +65,13 @@ Audio(audio, rate=16000)
65
 
66
  ## Tips
67
 
68
- * Try to provide descriptive text inputs to AudioLDM. You can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
 
69
  * It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
 
 
70
  * The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
71
- * The _length_ of the predicted audio sample can be controlled by varying the `height` argument: larger heights give longer spectrograms and thus longer audio samples at the expense of slower inference.
72
 
73
  # Citation
74
 
 
6
 
7
  # AudioLDM
8
 
9
+ AudioLDM is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.15.0 onwards.
10
 
11
  # Model Details
12
 
 
29
  First, install the required packages:
30
 
31
  ```
32
+ pip install --upgrade diffusers transformers
33
  ```
34
 
35
  ## Text-to-Audio
 
46
  pipe = pipe.to("cuda")
47
 
48
  prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
49
+ audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
50
  ```
51
 
52
  The resulting audio output can be saved as a .wav file:
 
65
 
66
  ## Tips
67
 
68
+ Prompts:
69
+ * Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
70
  * It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
71
+
72
+ Inference:
73
  * The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
74
+ * The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.
75
 
76
  # Citation
77