--- license: mit inference: false --- # Introduction **Music2Vec** is accepted as 2-page abstract in Late Breaking Demos (LBD) at the ISMIR 2022. It is a completely unsupervised model trained on 1000 hour music audios. Our model is SOTA-comparable on multiple MIR tasks even under probing settings, while keeping fine-tunable on a single 2080Ti. # Model Usage ## Huggingface Loading ```python from transformers import Wav2Vec2Processor, Data2VecAudioModel import torch from datasets import load_dataset # load demo audio and set processor dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation") dataset = dataset.sort("id") sampling_rate = dataset.features["audio"].sampling_rate processor = Wav2Vec2Processor.from_pretrained("facebook/data2vec-audio-base-960h") # loading our model weights model = Data2VecAudioModel.from_pretrained("m-a-p/music2vec-v1") # audio file is decoded on the fly inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) # take a look at the output shape last_hidden_states = outputs.last_hidden_state print(list(last_hidden_states.shape)) # [1, 292, 768] ``` Our model is based on the [data2vec audio model](https://huggingface.co/docs/transformers/model_doc/data2vec#transformers.Data2VecAudioModel). # Citation The paper can be found at [ISMIR](https://ismir2022program.ismir.net/lbd_410.html). ```shell to be done ```