Natooz commited on
Commit
134b9ca
·
1 Parent(s): c0a3a15

Upload 8 files

Browse files
README.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model card
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This is a generative model from the paper "*Byte Pair Encoding for Symbolic Music*" (EMNLP 2023). The model has been trained with Byte Pair Encoding (BPE) on the [Maestro dataset](https://magenta.tensorflow.org/datasets/maestro) to generate classical piano music with the REMI tokenizer.
12
+
13
+ ## Model Details
14
+
15
+ ### Model Description
16
+
17
+ <!-- Provide a longer summary of what this model is. -->
18
+
19
+ It has a vocabulary of 20k tokens learned with [Byte Pair Encoding (BPE)](https://arxiv.org/abs/2301.11975) using [MidiTok](https://github.com/Natooz/MidiTok).
20
+
21
+ - **Developed and shared by:** [Nathan Fradet](https://twitter.com/NathanFradet)
22
+ - **Affiliations**: [Sorbonne University (LIP6 lab)](https://www.sorbonne-universite.fr/en) and [Aubay](https://aubay.com/en/)
23
+ - **Model type:** causal autoregressive Transformer
24
+ - **Backbone model:** [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)
25
+ - **Music genres:** Classical piano 🎹
26
+ - **License:** Apache 2.0
27
+
28
+ ### Model Sources
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** https://github.com/Natooz/BPE-Symbolic-Music
33
+ - **Paper:** https://arxiv.org/abs/2301.11975
34
+
35
+ ## Uses
36
+
37
+ The model is designed for autoregressive music generation. It generates the continuation of a music prompt.
38
+
39
+ ## How to Get Started with the Model
40
+
41
+ Use the code below to get started with the model.
42
+ You will need the `miditok`, `transformers` and `torch` packages to make it run, that can be installed with pip.
43
+
44
+ You will also need to manually download the `tokenizer.conf` file from the [repo files](https://huggingface.co/Natooz/Maestro-REMI-bpe20k/tree/main).
45
+
46
+ ```Python
47
+ import torch
48
+ from transformers import AutoModelForCausalLM
49
+ from miditok import REMI
50
+ from miditoolkit import MidiFile
51
+
52
+ torch.set_default_device("cuda")
53
+ model = AutoModelForCausalLM.from_pretrained("Natooz/Maestro-REMI-bpe20k", trust_remote_code=True, torch_dtype="auto")
54
+ tokenizer = REMI(params="tokenizer.conf")
55
+ input_midi = MidiFile("path/to/file.mid")
56
+ input_tokens = tokenizer(input_midi)
57
+
58
+ generated_token_ids = model.generate(input_tokens.ids, max_length=200)
59
+ generated_midi = tokenizer(generated_token_ids)
60
+ generated_midi.dump("path/to/continued.mid")
61
+ ```
62
+
63
+ ## Training Details
64
+
65
+ ### Training Data
66
+
67
+ <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
68
+
69
+ The model has been trained on the [Maestro](https://magenta.tensorflow.org/datasets/maestro) dataset. The dataset contains about 200 hours of classical piano music. The tokenizer is trained with Byte Pair Encoding (BPE) to build a vocabulary of 20k tokens.
70
+
71
+ ### Training Procedure
72
+
73
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
74
+
75
+ - **Training regime:** fp16 mixed precision on V100 PCIE 32GB GPUs
76
+ - **Compute Region:** France
77
+
78
+ ### Training hyperparameters
79
+
80
+ The following hyperparameters were used during training:
81
+ - learning_rate: 0.0001
82
+ - train_batch_size: 64
83
+ - eval_batch_size: 96
84
+ - seed: 444
85
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
86
+ - lr_scheduler_type: cosine_with_restarts
87
+ - lr_scheduler_warmup_ratio: 0.3
88
+ - training_steps: 100000
89
+
90
+ ### Environmental impact
91
+
92
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly
93
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). -->
94
+
95
+ We cannot estimate reliably the amount of CO2eq emitted, as we lack data on the exact power source used during training. However, we can highlight that the cluster used is mostly powered by nuclear energy, which is a low carbon energy source ensuring a reduced direct environmental impact.
96
+
97
+ ## Citation
98
+
99
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
100
+
101
+ **BibTeX:**
102
+
103
+ ```bibtex
104
+ @inproceedings{bpe-symbolic-music,
105
+ title = "Byte Pair Encoding for Symbolic Music",
106
+ author = "Fradet, Nathan and
107
+ Gutowski, Nicolas and
108
+ Chhel, Fabien and
109
+ Briot, Jean-Pierre",
110
+ booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
111
+ month = dec,
112
+ year = "2023",
113
+ address = "Singapore",
114
+ publisher = "Association for Computational Linguistics",
115
+ url = "https://arxiv.org/abs/2301.11975",
116
+ }
117
+ ```
118
+
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 168.92,
3
+ "test_loss": 10.012152671813965,
4
+ "test_runtime": 1628.4473,
5
+ "test_samples_per_second": 5.199,
6
+ "test_steps_per_second": 0.028,
7
+ "test_tse_ndup": 0.17967297977995939,
8
+ "test_tse_nnof": 0.0,
9
+ "test_tse_nnon": 0.0,
10
+ "test_tse_time": 0.20978844964272034,
11
+ "test_tse_type": 0.04320047391956446,
12
+ "train_loss": 3.096821060371399,
13
+ "train_runtime": 63835.4946,
14
+ "train_samples_per_second": 200.515,
15
+ "train_steps_per_second": 1.567,
16
+ "tse_ndup": 0.003601619428289543,
17
+ "tse_nnof": 0.0,
18
+ "tse_nnon": 0.0,
19
+ "tse_time": 0.052002719382478045,
20
+ "tse_type": 0.0006408855316656025
21
+ }
events.out.tfevents.1686249408.star101.1938697.12 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8684fc756b1d913d2f16bc1cf1a830e02c58734754a310f2decb3285cbf6a58
3
+ size 857376
test_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "tse_ndup": 0.003601619428289543,
3
+ "tse_nnof": 0.0,
4
+ "tse_nnon": 0.0,
5
+ "tse_time": 0.052002719382478045,
6
+ "tse_type": 0.0006408855316656025
7
+ }
tokenizer.conf ADDED
The diff for this file is too large to render. See raw diff
 
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 168.92,
3
+ "train_loss": 3.096821060371399,
4
+ "train_runtime": 63835.4946,
5
+ "train_samples_per_second": 200.515,
6
+ "train_steps_per_second": 1.567
7
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc88801538ed195ddf69ecfdff0b2e675920ad563c5303dbb220360620cf9515
3
+ size 5307