Spaces:
Runtime error
Runtime error
squash last commits due to large files
Browse files- README.md +13 -4
- notebooks/test_model.ipynb +0 -0
- notebooks/test_model_breaks.ipynb +0 -0
- train_unconditional.py +1 -1
README.md
CHANGED
@@ -16,13 +16,22 @@ license: gpl-3.0
|
|
16 |
|
17 |
---
|
18 |
|
|
|
|
|
|
|
|
|
19 |
![mel spectrogram](mel.png)
|
20 |
|
21 |
-
|
|
|
|
|
22 |
|
23 |
-
A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test_model.ipynb` and `test_model_breaks.ipynb` notebooks for examples.
|
24 |
|
25 |
-
You can play around with the model I trained on about 500 songs from my Spotify "liked" playlist on [Google Colab](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb) or [Hugging Face spaces](https://huggingface.co/spaces/teticio/audio-diffusion). Check out some
|
|
|
|
|
|
|
26 |
|
27 |
## Generate Mel spectrogram dataset from directory of audio files
|
28 |
#### Training can be run with Mel spectrograms of resolution 64x64 on a single commercial grade GPU (e.g. RTX 2080 Ti). The `hop_length` should be set to 1024 for better results.
|
@@ -30,7 +39,7 @@ You can play around with the model I trained on about 500 songs from my Spotify
|
|
30 |
```bash
|
31 |
python audio_to_images.py \
|
32 |
--resolution 64 \
|
33 |
-
--hop_length 1024\
|
34 |
--input_dir path-to-audio-files \
|
35 |
--output_dir data-test
|
36 |
```
|
|
|
16 |
|
17 |
---
|
18 |
|
19 |
+
**UPDATE**: I've trained a new [model](https://huggingface.co/teticio/audio-diffusion-breaks-256) on 30,000 samples that have been used in music, sourced from [WhoSampled](https://whosampled.com) and [YouTube](https://youtube.com). The idea is that the model could be used to generate loops or "breaks" that can be sampled to make new tracks. People ("crate diggers") go to a lot of lengths or are willing to pay a lot of money to find breaks in old records. See [`test_model_breaks.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model_breaks.ipynb) for details.
|
20 |
+
|
21 |
+
---
|
22 |
+
|
23 |
![mel spectrogram](mel.png)
|
24 |
|
25 |
+
---
|
26 |
+
|
27 |
+
Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the [`test_mel.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_mel.ipynb) notebook.
|
28 |
|
29 |
+
A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the [`test_model.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model.ipynb) and [`test_model_breaks.ipynb`](https://github.com/teticio/audio-diffusion/blob/main/notebooks/test_model_breaks.ipynb) notebooks for examples.
|
30 |
|
31 |
+
You can play around with the model I trained on about 500 songs from my Spotify "liked" playlist on [Google Colab](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb) or [Hugging Face spaces](https://huggingface.co/spaces/teticio/audio-diffusion). Check out some automatically generated loops [here](https://soundcloud.com/teticio2/sets/audio-diffusion-loops).
|
32 |
+
|
33 |
+
|
34 |
+
---
|
35 |
|
36 |
## Generate Mel spectrogram dataset from directory of audio files
|
37 |
#### Training can be run with Mel spectrograms of resolution 64x64 on a single commercial grade GPU (e.g. RTX 2080 Ti). The `hop_length` should be set to 1024 for better results.
|
|
|
39 |
```bash
|
40 |
python audio_to_images.py \
|
41 |
--resolution 64 \
|
42 |
+
--hop_length 1024 \
|
43 |
--input_dir path-to-audio-files \
|
44 |
--output_dir data-test
|
45 |
```
|
notebooks/test_model.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
notebooks/test_model_breaks.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
train_unconditional.py
CHANGED
@@ -40,7 +40,7 @@ def main(args):
|
|
40 |
)
|
41 |
|
42 |
if args.from_pretrained is not None:
|
43 |
-
model =
|
44 |
else:
|
45 |
model = UNet2DModel(
|
46 |
sample_size=args.resolution,
|
|
|
40 |
)
|
41 |
|
42 |
if args.from_pretrained is not None:
|
43 |
+
model = DDPMPipeline.from_pretrained(args.from_pretrained).unet
|
44 |
else:
|
45 |
model = UNet2DModel(
|
46 |
sample_size=args.resolution,
|