yizhilll commited on
Commit
8836b78
1 Parent(s): 0deaa64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -2,3 +2,49 @@
2
  license: mit
3
  inference: false
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  inference: false
4
  ---
5
+
6
+ # Introduction
7
+
8
+ The **Music2Vec** is accepted at the ISMIR 2022 LBD.
9
+ It is a completely unsupervised model trained on 1000 hour music audios.
10
+ Our model is SOTA-comparable on multiple MIR tasks even under probing settings, while keeping fine-tunable on a single 2080Ti.
11
+
12
+ # Model Usage
13
+
14
+ ## Huggingface Loading
15
+
16
+ ```python
17
+ from transformers import Wav2Vec2Processor, Data2VecAudioModel
18
+ import torch
19
+ from datasets import load_dataset
20
+
21
+ # load demo audio and set processor
22
+ dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
23
+ dataset = dataset.sort("id")
24
+ sampling_rate = dataset.features["audio"].sampling_rate
25
+ processor = Wav2Vec2Processor.from_pretrained("facebook/data2vec-audio-base-960h")
26
+
27
+ # loading our model weights
28
+ model = Data2VecAudioModel.from_pretrained("m-a-p/music2vec-v1")
29
+
30
+
31
+ # audio file is decoded on the fly
32
+ inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
33
+ with torch.no_grad():
34
+ outputs = model(**inputs)
35
+
36
+ # take a look at the output shape
37
+ last_hidden_states = outputs.last_hidden_state
38
+ print(list(last_hidden_states.shape)) # [1, 292, 768]
39
+ ```
40
+
41
+ Our model is based on the [data2vec audio model](https://huggingface.co/docs/transformers/model_doc/data2vec#transformers.Data2VecAudioModel).
42
+
43
+ # Citation
44
+
45
+ The paper can be found at [zenodo](https://zenodo.org/record/7403084#.Y47u83ZBxPZ) and citation is TBD.
46
+
47
+ ```shell
48
+ to be done
49
+ ```
50
+