Commit
·
d3a65c3
1
Parent(s):
18acc5e
Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@
|
|
6 |
|
7 |
# AudioLDM
|
8 |
|
9 |
-
AudioLDM is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.
|
10 |
|
11 |
# Model Details
|
12 |
|
@@ -29,7 +29,7 @@ sound effects, human speech and music.
|
|
29 |
First, install the required packages:
|
30 |
|
31 |
```
|
32 |
-
pip install --upgrade
|
33 |
```
|
34 |
|
35 |
## Text-to-Audio
|
@@ -46,7 +46,7 @@ pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
|
|
46 |
pipe = pipe.to("cuda")
|
47 |
|
48 |
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
|
49 |
-
audio = pipe(prompt, num_inference_steps=10,
|
50 |
```
|
51 |
|
52 |
The resulting audio output can be saved as a .wav file:
|
@@ -65,10 +65,13 @@ Audio(audio, rate=16000)
|
|
65 |
|
66 |
## Tips
|
67 |
|
68 |
-
|
|
|
69 |
* It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
|
|
|
|
|
70 |
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
|
71 |
-
* The _length_ of the predicted audio sample can be controlled by varying the `
|
72 |
|
73 |
# Citation
|
74 |
|
|
|
6 |
|
7 |
# AudioLDM
|
8 |
|
9 |
+
AudioLDM is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.15.0 onwards.
|
10 |
|
11 |
# Model Details
|
12 |
|
|
|
29 |
First, install the required packages:
|
30 |
|
31 |
```
|
32 |
+
pip install --upgrade diffusers transformers
|
33 |
```
|
34 |
|
35 |
## Text-to-Audio
|
|
|
46 |
pipe = pipe.to("cuda")
|
47 |
|
48 |
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
|
49 |
+
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
|
50 |
```
|
51 |
|
52 |
The resulting audio output can be saved as a .wav file:
|
|
|
65 |
|
66 |
## Tips
|
67 |
|
68 |
+
Prompts:
|
69 |
+
* Descriptive prompt inputs work best: you can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
|
70 |
* It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
|
71 |
+
|
72 |
+
Inference:
|
73 |
* The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
|
74 |
+
* The _length_ of the predicted audio sample can be controlled by varying the `audio_length_in_s` argument.
|
75 |
|
76 |
# Citation
|
77 |
|