File size: 5,422 Bytes
ce8c824 222cf3e ce8c824 a993c16 0875911 e5c69f2 0875911 e5c69f2 0875911 e5c69f2 0875911 e5c69f2 f65fe41 e5c69f2 0875911 e5c69f2 0875911 e5c69f2 0875911 e5c69f2 0875911 12547f3 0875911 12547f3 0875911 12547f3 0875911 12547f3 0875911 12547f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
inference: false
tags:
- musicgen
license: cc-by-nc-4.0
---
# MusicGen - Medium - 1.5B
MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts.
It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.
Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass.
By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.
MusicGen was published in [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by *Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez*.
Four checkpoints are released:
- [small](https://huggingface.co/facebook/musicgen-small)
- [**medium** (this checkpoint)](https://huggingface.co/facebook/musicgen-medium)
- [large](https://huggingface.co/facebook/musicgen-large)
- [melody](https://huggingface.co/facebook/musicgen-melody)
## Example
Try out MusicGen yourself!
* Audiocraft Colab:
<a target="_blank" href="https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
* Hugging Face Colab:
<a target="_blank" href="https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
* Hugging Face Demo:
<a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen">
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
</a>
## 🤗 Transformers Usage
You can run MusicGen locally with the 🤗 Transformers library from version 4.31.0 onwards.
1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:
```
pip install git+https://github.com/huggingface/transformers.git
```
2. Run the following Python code to generate text-conditional audio samples:
```py
from transformers import AutoProcessor, MusicgenForConditionalGeneration
processor = AutoProcessor.from_pretrained("facebook/musicgen-medium")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-medium")
inputs = processor(
text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
padding=True,
return_tensors="pt",
)
audio_values = model.generate(**inputs, max_new_tokens=256)
```
3. Listen to the audio samples either in an ipynb notebook:
```py
from IPython.display import Audio
sampling_rate = model.config.audio_encoder.sampling_rate
Audio(audio_values[0].numpy(), rate=sampling_rate)
```
Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
```py
import scipy
sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
```
For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the [MusicGen docs](https://huggingface.co/docs/transformers/model_doc/musicgen).
## Audiocraft Usage
You can also run MusicGen locally through the original [Audiocraft library]((https://github.com/facebookresearch/audiocraft):
1. First install the [`audiocraft` library](https://github.com/facebookresearch/audiocraft)
```
pip install git+https://github.com/facebookresearch/audiocraft.git
```
2. Make sure to have [`ffmpeg`](https://ffmpeg.org/download.html) installed:
```
apt get install ffmpeg
```
3. Run the following Python code:
```py
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
model = MusicGen.get_pretrained("medium")
model.set_generation_params(duration=8) # generate 8 seconds.
descriptions = ["happy rock", "energetic EDM"]
wav = model.generate(descriptions) # generates 2 samples.
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")
```
## Model details
**Model card:** More details on the model can be found in [MusicGen's model card](https://github.com/facebookresearch/audiocraft/blob/main/model_cards/MUSICGEN_MODEL_CARD.md).
**Paper or resources for more information:** More information can be found in the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284).
**Citation details:**
```
@misc{copet2023simple,
title={Simple and Controllable Music Generation},
author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
year={2023},
eprint={2306.05284},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
```
**License:** Code is released under MIT, model weights are released under CC-BY-NC 4.0.
**Where to send questions or comments about the model:** Questions and comments about MusicGen can be sent via the [Github repository](https://github.com/facebookresearch/audiocraft) of the project, or by opening an issue. |