Spaces:
Running
on
A10G
Running
on
A10G
Merge branch 'main' of github.com:facebookresearch/audiocraft
Browse files- MODEL_CARD.md +2 -2
- README.md +9 -3
- requirements.txt +1 -0
MODEL_CARD.md
CHANGED
@@ -52,7 +52,7 @@ The model was evaluated on the [MusicCaps benchmark](https://www.kaggle.com/data
|
|
52 |
|
53 |
## Training datasets
|
54 |
|
55 |
-
The model was trained using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
|
56 |
|
57 |
## Quantitative analysis
|
58 |
|
@@ -62,7 +62,7 @@ More information can be found in the paper [Simple and Controllable Music Genera
|
|
62 |
|
63 |
**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
|
64 |
|
65 |
-
**Mitigations:**
|
66 |
|
67 |
**Limitations:**
|
68 |
|
|
|
52 |
|
53 |
## Training datasets
|
54 |
|
55 |
+
The model was trained on licensed data using the following sources: the [Meta Music Initiative Sound Collection](https://www.fb.com/sound), [Shutterstock music collection](https://www.shutterstock.com/music) and the [Pond5 music collection](https://www.pond5.com/). See the paper for more details about the training set and corresponding preprocessing.
|
56 |
|
57 |
## Quantitative analysis
|
58 |
|
|
|
62 |
|
63 |
**Data:** The data sources used to train the model are created by music professionals and covered by legal agreements with the right holders. The model is trained on 20K hours of data, we believe that scaling the model on larger datasets can further improve the performance of the model.
|
64 |
|
65 |
+
**Mitigations:** Vocals have been removed from the data source using corresponding tags, and then using using a state-of-the-art music source separation method, namely using the open source [Hybrid Transformer for Music Source Separation](https://github.com/facebookresearch/demucs) (HT-Demucs).
|
66 |
|
67 |
**Limitations:**
|
68 |
|
README.md
CHANGED
@@ -8,7 +8,7 @@ Audiocraft is a PyTorch library for deep learning research on audio generation.
|
|
8 |
## MusicGen
|
9 |
|
10 |
Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
|
11 |
-
Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't
|
12 |
all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
|
13 |
them in parallel, thus having only 50 auto-regressive steps per second of audio.
|
14 |
Check out our [sample page][musicgen_samples] or test the available demo!
|
@@ -21,6 +21,8 @@ Check out our [sample page][musicgen_samples] or test the available demo!
|
|
21 |
</a>
|
22 |
<br>
|
23 |
|
|
|
|
|
24 |
## Installation
|
25 |
Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
|
26 |
|
@@ -35,7 +37,11 @@ pip install -e . # or if you cloned the repo locally
|
|
35 |
```
|
36 |
|
37 |
## Usage
|
38 |
-
|
|
|
|
|
|
|
|
|
39 |
|
40 |
## API
|
41 |
|
@@ -52,7 +58,7 @@ GPUs will be able to generate short sequences, or longer sequences with the `sma
|
|
52 |
**Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
|
53 |
You can install it with:
|
54 |
```
|
55 |
-
apt
|
56 |
```
|
57 |
|
58 |
See after a quick example for using the API.
|
|
|
8 |
## MusicGen
|
9 |
|
10 |
Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
|
11 |
+
Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
|
12 |
all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
|
13 |
them in parallel, thus having only 50 auto-regressive steps per second of audio.
|
14 |
Check out our [sample page][musicgen_samples] or test the available demo!
|
|
|
21 |
</a>
|
22 |
<br>
|
23 |
|
24 |
+
We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
|
25 |
+
|
26 |
## Installation
|
27 |
Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
|
28 |
|
|
|
37 |
```
|
38 |
|
39 |
## Usage
|
40 |
+
We offer a number of way to interact with MusicGen:
|
41 |
+
1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing).
|
42 |
+
2. You can use the gradio demo locally by running `python app.py`.
|
43 |
+
3. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
|
44 |
+
4. Finally, @camenduru did a great notebook that combines [the MusicGen Gradio demo with Google Colab](https://github.com/camenduru/MusicGen-colab)
|
45 |
|
46 |
## API
|
47 |
|
|
|
58 |
**Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
|
59 |
You can install it with:
|
60 |
```
|
61 |
+
apt-get install ffmpeg
|
62 |
```
|
63 |
|
64 |
See after a quick example for using the API.
|
requirements.txt
CHANGED
@@ -17,3 +17,4 @@ transformers
|
|
17 |
xformers
|
18 |
demucs
|
19 |
librosa
|
|
|
|
17 |
xformers
|
18 |
demucs
|
19 |
librosa
|
20 |
+
gradio
|