Diffusers
AudioLDMPipeline
sanchit-gandhi HF staff commited on
Commit
37db3c8
1 Parent(s): 25f7011

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # AudioLDM
8
+
9
+ AudioLDM is a latent text-to-audio diffusion model capable of generating realistic audio samples given any text input. It is available in the 🧨 Diffusers library from v0.14.0 onwards.
10
+
11
+ # Model Details
12
+
13
+ AudioLDM was proposed in the paper [AudioLDM: Text-to-Audio Generation with Latent Diffusion Models](https://arxiv.org/abs/2301.12503) by Haohe Liu et al.
14
+
15
+ Inspired by [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion-v1-4), AudioLDM
16
+ is a text-to-audio _latent diffusion model (LDM)_ that learns continuous audio representations from [CLAP](https://huggingface.co/laion/clap-htsat-unfused)
17
+ latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
18
+ sound effects, human speech and music.
19
+
20
+ ## Model Sources
21
+
22
+ - [**Original Repository**](https://github.com/haoheliu/AudioLDM)
23
+ - [**🧨 Diffusers Pipeline**](https://huggingface.co/docs/diffusers/api/pipelines/audioldm)
24
+ - [**Paper**](https://arxiv.org/abs/2301.12503)
25
+ - [**Demo**](https://huggingface.co/spaces/haoheliu/audioldm-text-to-audio-generation)
26
+
27
+ # Usage
28
+
29
+ First, install the required packages:
30
+
31
+ ```
32
+ pip install --upgrade git+https://github.com/huggingface/diffusers git+https://github.com/huggingface/transformers scipy
33
+ ```
34
+
35
+ ## Text-to-Audio
36
+
37
+ For text-to-audio generation, the [AudioLDMPipeline](https://huggingface.co/docs/diffusers/api/pipelines/audioldm) can be
38
+ used to load pre-trained weights and generate text-conditional audio outputs:
39
+
40
+ ```python
41
+ from diffusers import AudioLDMPipeline
42
+ import torch
43
+ import scipy
44
+
45
+ repo_id = "sanchit-gandhi/audioldm-text-to-audio"
46
+ pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
47
+ pipe = pipe.to("cuda")
48
+
49
+ prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
50
+ audio = pipe(prompt, num_inference_steps=10, height=512).audios[0]
51
+ ```
52
+
53
+ The resulting audio output can be saved as a .wav file:
54
+ ```python
55
+ import scipy
56
+
57
+ scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
58
+ ```
59
+
60
+ Or displayed in a Jupyter Notebook / Google Colab:
61
+ ```python
62
+ from IPython.display import Audio
63
+
64
+ Audio(audio, 16000))
65
+ ```
66
+
67
+ ## Tips
68
+
69
+ * Try to provide descriptive text inputs to AudioLDM. You can use adjectives to describe the sound (e.g. "high quality" or "clear") and make the prompt context specific (e.g., "water stream in a forest" instead of "stream").
70
+ * It's best to use general terms like 'cat' or 'dog' instead of specific names or abstract objects that the model may not be familiar with.
71
+ * The _quality_ of the predicted audio sample can be controlled by the `num_inference_steps` argument: higher steps give higher quality audio at the expense of slower inference.
72
+ * The _length_ of the predicted audio sample can be controlled by varying the `height` argument: larger heights give longer spectrograms and thus longer audio samples at the expense of slower inference.
73
+
74
+ # Citation
75
+
76
+ **BibTeX:**
77
+ ```
78
+ @article{liu2023audioldm,
79
+ title={AudioLDM: Text-to-Audio Generation with Latent Diffusion Models},
80
+ author={Liu, Haohe and Chen, Zehua and Yuan, Yi and Mei, Xinhao and Liu, Xubo and Mandic, Danilo and Wang, Wenwu and Plumbley, Mark D},
81
+ journal={arXiv preprint arXiv:2301.12503},
82
+ year={2023}
83
+ }
84
+ ```