|
--- |
|
tags: |
|
- music-generation |
|
- transformer |
|
- pytorch |
|
- audio |
|
- music |
|
- piano |
|
license: mit |
|
--- |
|
# Compose & Embellish: Piano Performance Generation Pipeline |
|
Trained model weights and training datasets for the paper: |
|
* Shih-Lun Wu and Yi-Hsuan Yang |
|
"[Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach](https://arxiv.org/abs/2209.08212)." |
|
_Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)_, 2023 |
|
|
|
**Note:** Materials here should be used in conjunction with our [model implementation Github repo](https://github.com/slSeanWU/Compose_and_Embellish). |
|
|
|
## Model characteristics |
|
### Stage 1: "Compose" model |
|
Generates **melody and chord progression** from scratch. |
|
|
|
- Model backbone: 12-layer Transformer w/ relative positional encoding |
|
- Num trainable params: 41.3M |
|
- Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications |
|
- Pretraining dataset: subset of [Lakh MIDI full](https://colinraffel.com/projects/lmd/) (**LMD-full**), 14934 songs |
|
- melody extraction (and data filtering) done by **matching lyrics to tracks**: https://github.com/gulnazaki/lyrics-melody/blob/main/pre-processing/create_dataset.py |
|
- structural segmentation done with **A\* search**: https://github.com/Dsqvival/hierarchical-structure-analysis |
|
- Finetuning dataset: subset of [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1591 songs |
|
- melody extraction done with **skyline algorithm**: https://github.com/wazenmai/MIDI-BERT/blob/CP/melody_extraction/skyline/analyzer.py |
|
- structural segmentation done in the same way as pretraining dataset |
|
- Training sequence length: 2400 |
|
### Stage 2: "Embellish" model |
|
Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs. |
|
- `embellish_model_gpt2_pop1k7_loss0.398.bin` |
|
- Model backbone: 12-layer **GPT-2 Transformer** ([implementation](https://huggingface.co/docs/transformers/en/model_doc/gpt2)) |
|
- Num trainable params: 38.2M |
|
- `embellish_model_pop1k7_loss0.399.bin` (requires `fast-transformers` package, which is outdated as of Jul. 2024) |
|
- Model backbone: 12-layer **Performer** ([paper](https://arxiv.org/abs/2009.14794), [implementation](https://github.com/idiap/fast-transformers)) |
|
- Num trainable params: 38.2M |
|
- Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications |
|
- Training dataset: [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1747 songs |
|
- Training sequence length: 3072 |
|
|
|
## BibTex |
|
If you find the materials useful, please consider citing our work: |
|
``` |
|
@inproceedings{wu2023compembellish, |
|
title={{Compose \& Embellish}: Well-Structured Piano Performance Generation via A Two-Stage Approach}, |
|
author={Wu, Shih-Lun and Yang, Yi-Hsuan}, |
|
booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, |
|
year={2023}, |
|
url={https://arxiv.org/pdf/2209.08212.pdf} |
|
} |
|
``` |