metadata

inference: false
tags:
  - musicgen
license: cc-by-nc-4.0

MusicGen - Large - 3.3B

MusicGen is a text-to-music model capable of genreating high-quality music samples conditioned on text descriptions or audio prompts. It is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods, like MusicLM, MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict them in parallel, thus having only 50 auto-regressive steps per second of audio.

MusicGen was published in Simple and Controllable Music Generation by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez.

Four checkpoints are released:

Example

Try out MusicGen yourself!

Audiocraft Colab:

Hugging Face Colab:

Hugging Face Demo:

🤗 Transformers Usage

You can run MusicGen locally with the 🤗 Transformers library from version 4.31.0 onwards.

First install the 🤗 Transformers library from main:

pip install git+https://github.com/huggingface/transformers.git

Run the following Python code to generate text-conditional audio samples:

from transformers import AutoProcessor, MusicgenForConditionalGeneration


processor = AutoProcessor.from_pretrained("facebook/musicgen-large")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-large")

inputs = processor(
    text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
    padding=True,
    return_tensors="pt",
)

audio_values = model.generate(**inputs, max_new_tokens=256)

Listen to the audio samples either in an ipynb notebook:

from IPython.display import Audio

sampling_rate = model.config.audio_encoder.sampling_rate
Audio(audio_values[0].numpy(), rate=sampling_rate)

Or save them as a .wav file using a third-party library, e.g. scipy:

import scipy

sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())

For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the MusicGen docs.

Audiocraft Usage

You can also run MusicGen locally through the original Audiocraft library:

First install the audiocraft library

pip install git+https://github.com/facebookresearch/audiocraft.git

Make sure to have ffmpeg installed:

apt get install ffmpeg

Run the following Python code:

from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained("large")
model.set_generation_params(duration=8)  # generate 8 seconds.

descriptions = ["happy rock", "energetic EDM"]

wav = model.generate(descriptions)  # generates 2 samples.

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness")

Model details

Model card: More information on the model can be found in MusicGen's model card.

Paper or resources for more information: More information can be found in the paper Simple and Controllable Music Generation.

Citation details:

@misc{copet2023simple,
      title={Simple and Controllable Music Generation}, 
      author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
      year={2023},
      eprint={2306.05284},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

License: Code is released under MIT, model weights are released under CC-BY-NC 4.0.

Where to send questions or comments about the model: Questions and comments about MusicGen can be sent via the Github repository of the project, or by opening an issue.