m-a-p
/

music2vec-v1

Feature Extraction

Model card Files Files and versions Community

yizhilll commited on Dec 6, 2022

Commit

8836b78

•

1 Parent(s): 0deaa64

Update README.md

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -2,3 +2,49 @@
 license: mit
 inference: false
 ---

 license: mit
 inference: false
 ---
+# Introduction
+The **Music2Vec** is accepted at the ISMIR 2022 LBD.
+It is a completely unsupervised model trained on 1000 hour music audios.
+Our model is SOTA-comparable on multiple MIR tasks even under probing settings, while keeping fine-tunable on a single 2080Ti.
+# Model Usage
+## Huggingface Loading
+```python
+from transformers import Wav2Vec2Processor, Data2VecAudioModel
+import torch
+from datasets import load_dataset
+# load demo audio and set processor
+dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
+dataset = dataset.sort("id")
+sampling_rate = dataset.features["audio"].sampling_rate
+processor = Wav2Vec2Processor.from_pretrained("facebook/data2vec-audio-base-960h")
+# loading our model weights
+model = Data2VecAudioModel.from_pretrained("m-a-p/music2vec-v1")
+# audio file is decoded on the fly
+inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+# take a look at the output shape
+last_hidden_states = outputs.last_hidden_state
+print(list(last_hidden_states.shape)) # [1, 292, 768]
+```
+Our model is based on the [data2vec audio model](https://huggingface.co/docs/transformers/model_doc/data2vec#transformers.Data2VecAudioModel).
+# Citation
+The paper can be found at [zenodo](https://zenodo.org/record/7403084#.Y47u83ZBxPZ) and citation is TBD.
+```shell
+to be done
+```