Update README.md
Browse files
README.md
CHANGED
@@ -2,3 +2,49 @@
|
|
2 |
license: mit
|
3 |
inference: false
|
4 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: mit
|
3 |
inference: false
|
4 |
---
|
5 |
+
|
6 |
+
# Introduction
|
7 |
+
|
8 |
+
The **Music2Vec** is accepted at the ISMIR 2022 LBD.
|
9 |
+
It is a completely unsupervised model trained on 1000 hour music audios.
|
10 |
+
Our model is SOTA-comparable on multiple MIR tasks even under probing settings, while keeping fine-tunable on a single 2080Ti.
|
11 |
+
|
12 |
+
# Model Usage
|
13 |
+
|
14 |
+
## Huggingface Loading
|
15 |
+
|
16 |
+
```python
|
17 |
+
from transformers import Wav2Vec2Processor, Data2VecAudioModel
|
18 |
+
import torch
|
19 |
+
from datasets import load_dataset
|
20 |
+
|
21 |
+
# load demo audio and set processor
|
22 |
+
dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
|
23 |
+
dataset = dataset.sort("id")
|
24 |
+
sampling_rate = dataset.features["audio"].sampling_rate
|
25 |
+
processor = Wav2Vec2Processor.from_pretrained("facebook/data2vec-audio-base-960h")
|
26 |
+
|
27 |
+
# loading our model weights
|
28 |
+
model = Data2VecAudioModel.from_pretrained("m-a-p/music2vec-v1")
|
29 |
+
|
30 |
+
|
31 |
+
# audio file is decoded on the fly
|
32 |
+
inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
|
33 |
+
with torch.no_grad():
|
34 |
+
outputs = model(**inputs)
|
35 |
+
|
36 |
+
# take a look at the output shape
|
37 |
+
last_hidden_states = outputs.last_hidden_state
|
38 |
+
print(list(last_hidden_states.shape)) # [1, 292, 768]
|
39 |
+
```
|
40 |
+
|
41 |
+
Our model is based on the [data2vec audio model](https://huggingface.co/docs/transformers/model_doc/data2vec#transformers.Data2VecAudioModel).
|
42 |
+
|
43 |
+
# Citation
|
44 |
+
|
45 |
+
The paper can be found at [zenodo](https://zenodo.org/record/7403084#.Y47u83ZBxPZ) and citation is TBD.
|
46 |
+
|
47 |
+
```shell
|
48 |
+
to be done
|
49 |
+
```
|
50 |
+
|