# MAGNeT
Welcome to MAGNeT's demo jupyter notebook. 
Here you will find a self-contained example of how to use MAGNeT for music/sound-effect generation.

First, we start by initializing MAGNeT for music generation, you can choose a model from the following selection:
1. facebook/magnet-small-10secs - a 300M non-autoregressive transformer capable of generating 10-second music conditioned on text.
2. facebook/magnet-medium-10secs - 1.5B parameters, 10 seconds music samples.
3. facebook/magnet-small-30secs - 300M parameters, 30 seconds music samples.
4. facebook/magnet-medium-30secs - 1.5B parameters, 30 seconds music samples.

We will use the `facebook/magnet-small-10secs` variant for the purpose of this demonstration.

In [None]:
from audiocraft.models import MAGNeT

model = MAGNeT.get_pretrained('facebook/magnet-small-10secs')

Next, let us configure the generation parameters. Specifically, you can control the following:
* `use_sampling` (bool, optional): use sampling if True, else do argmax decoding. Defaults to True.
* `top_k` (int, optional): top_k used for sampling. Defaults to 0.
* `top_p` (float, optional): top_p used for sampling, when set to 0 top_k is used. Defaults to 0.9.
* `temperature` (float, optional): Initial softmax temperature parameter. Defaults to 3.0.
* `max_clsfg_coef` (float, optional): Initial coefficient used for classifier free guidance. Defaults to 10.0.
* `min_clsfg_coef` (float, optional): Final coefficient used for classifier free guidance. Defaults to 1.0.
* `decoding_steps` (list of n_q ints, optional): The number of iterative decoding steps, for each of the n_q RVQ codebooks.
* `span_arrangement` (str, optional): Use either non-overlapping spans ('nonoverlap') or overlapping spans ('stride1') 
 in the masking scheme. 

When left unchanged, MAGNeT will revert to its default parameters.

In [None]:
model.set_generation_params(
 use_sampling=True,
 top_k=0,
 top_p=0.9,
 temperature=3.0,
 max_cfg_coef=10.0,
 min_cfg_coef=1.0,
 decoding_steps=[int(20 * model.lm.cfg.dataset.segment_duration // 10), 10, 10, 10],
 span_arrangement='stride1'
)

Next, we can go ahead and start generating music given textual prompts.

### Text-conditional Generation - Music

In [None]:
from audiocraft.utils.notebook import display_audio

###### Text-to-music prompts - examples ######
text = "80s electronic track with melodic synthesizers, catchy beat and groovy bass"
# text = "80s electronic track with melodic synthesizers, catchy beat and groovy bass. 170 bpm"
# text = "Earthy tones, environmentally conscious, ukulele-infused, harmonic, breezy, easygoing, organic instrumentation, gentle grooves"
# text = "Funky groove with electric piano playing blue chords rhythmically"
# text = "Rock with saturated guitars, a heavy bass line and crazy drum break and fills."
# text = "A grand orchestral arrangement with thunderous percussion, epic brass fanfares, and soaring strings, creating a cinematic atmosphere fit for a heroic battle"
 
N_VARIATIONS = 3
descriptions = [text for _ in range(N_VARIATIONS)]

print(f"text prompt: {text}\n")
output = model.generate(descriptions=descriptions, progress=True, return_tokens=True)
display_audio(output[0], sample_rate=model.compression_model.sample_rate)

### Text-conditional Generation - Sound Effects

Besides music, MAGNeT models can generate sound effects given textual prompts. 
First, let's load an Audio-MAGNeT model, out of the following collection: 
1. facebook/audio-magnet-small - a 300M non-autoregressive transformer capable of generating 10 second sound effects conditioned on text.
2. facebook/audio-magnet-medium - 10 second sound effect generation, 1.5B parameters.

We will use the `facebook/audio-magnet-small` variant for the purpose of this demonstration.

In [None]:
from audiocraft.models import MAGNeT

model = MAGNeT.get_pretrained('facebook/audio-magnet-small')

The recommended parameters for sound generation are a bit different than the defaults in MAGNeT, let's initialize it: 

In [None]:
model.set_generation_params(
 use_sampling=True,
 top_k=0,
 top_p=0.8,
 temperature=3.5,
 max_cfg_coef=20.0,
 min_cfg_coef=1.0,
 decoding_steps=[int(20 * model.lm.cfg.dataset.segment_duration // 10), 10, 10, 10],
 span_arrangement='stride1'
)

Next, we can go ahead and start generating sounds given textual prompts.

In [None]:
from audiocraft.utils.notebook import display_audio
 
###### Text-to-audio prompts - examples ######
text = "Seagulls squawking as ocean waves crash while wind blows heavily into a microphone."
# text = "A toilet flushing as music is playing and a man is singing in the distance."

N_VARIATIONS = 3
descriptions = [text for _ in range(N_VARIATIONS)]

print(f"text prompt: {text}\n")
output = model.generate(descriptions=descriptions, progress=True, return_tokens=True)
display_audio(output[0], sample_rate=model.compression_model.sample_rate)