inference: false
tags:
- musicgen
license: cc-by-nc-4.0
MusicGen - Large - 3.3B
MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts. It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.
MusicGen was published in Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez.
Four checkpoints are released:
Example
Try out MusicGen yourself!
- Audiocraft Colab:
- Hugging Face Colab:
- Hugging Face Demo:
🤗 Transformers Usage
You can run MusicGen locally with the 🤗 Transformers library from version 4.31.0 onwards.
- First install the 🤗 Transformers library from main:
pip install git+https://github.com/huggingface/transformers.git
- Run the following Python code to generate text-conditional audio samples:
from transformers import AutoProcessor, MusicgenForConditionalGeneration
processor = AutoProcessor.from_pretrained("facebook/musicgen-large")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-large")
inputs = processor(
text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
padding=True,
return_tensors="pt",
)
audio_values = model.generate(**inputs, max_new_tokens=256)
- Listen to the audio samples either in an ipynb notebook:
from IPython.display import Audio
sampling_rate = model.config.audio_encoder.sampling_rate
Audio(audio_values[0].numpy(), rate=sampling_rate)
Or save them as a .wav
file using a third-party library, e.g. scipy
:
import scipy
sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the MusicGen docs.
Audiocraft Usage
You can also run MusicGen locally through the original Audiocraft library:
- First install the
audiocraft
library
pip install git+https://github.com/facebookresearch/audiocraft.git
- Make sure to have
ffmpeg
installed:
apt get install ffmpeg
- Run the following Python code:
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
model = MusicGen.get_pretrained("large")
model.set_generation_params(duration=8) # generate 8 seconds.
descriptions = ["happy rock", "energetic EDM"]
wav = model.generate(descriptions) # generates 2 samples.
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")
Model details
Model card: More information on the model can be found in MusicGen's model card.
Paper or resources for more information: More information can be found in the paper Simple and Controllable Music Generation.
Citation details:
@misc{copet2023simple,
title={Simple and Controllable Music Generation},
author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
year={2023},
eprint={2306.05284},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
License: Code is released under MIT, model weights are released under CC-BY-NC 4.0.
Where to send questions or comments about the model: Questions and comments about MusicGen can be sent via the Github repository of the project, or by opening an issue.