File size: 4,803 Bytes
1e735e8 eefa2a9 1e735e8 ad1a2fd 65465fe ad1a2fd 8aab965 1e735e8 eefa2a9 72c6a39 eefa2a9 1e735e8 887b167 419f041 1e735e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
tags:
- gpt2
- text-generation
- music-modeling
- music-generation
widget:
- text: PIECE_START
- text: PIECE_START PIECE_START TRACK_START INST=34 DENSITY=8
- text: PIECE_START TRACK_START INST=1
---
# GPT-2 for Music
Language Models such as GPT-2 can be used for Music Generation. The idea is to represent pieces of music as texts, effectively reducing the task to Language Generation.
This model is a rather small instance of GPT-2 trained the [Lakhclean dataset](https://colinraffel.com/projects/lmd/). The model generates 4 bars at a time at a 16th note resolution with 4/4 meter.
If you want to contribute, if you want to say hello, if you want to know more, find me here:
- https://www.linkedin.com/in/dr-tristan-behrens-734967a2/
- https://www.youtube.com/@drtristanbehrens
- https://twitter.com/DrTBehrens
- https://github.com/AI-Guru
- https://huggingface.co/TristanBehrens
- https://huggingface.co/ai-guru
Run the model on Google Colab: https://colab.research.google.com/drive/1Mz-KJ8vX4Wylr4mzvgP-MclDwQJ06KSq?usp=sharing
## License
You are free to use this model in any open-source context without charge. If you do so, please credit me.
However, if you wish to use the model for commercial purposes, please contact me to discuss licensing terms. Depending on the specific use case, there may be fees associated with commercial use. I am open to negotiating the terms of the license to meet your needs and ensure that the model is used appropriately. Please feel free to reach out to me at your earliest convenience to discuss further.
## Model description
The model is GPT-2 with 6 decoders and 8 attention heads each. The context length is 2048. The embedding dimensions are 512.
## Model family
This model is part of a huge group of Transformers I have trained. Most of them are not publicly available.
If you are interested in using andor licensing one of the models, please get in touch.
### Lakhclean
These models were trained on roundabout 15K MIDI files (the same as the model you are viewing now) from the Lakhclean dataset.
- lakhclean_mmmbar_4bars_d-2048: 4 bars resolution, bar inpainting, note density conditioning
- lakhclean_mmmbar_8bars_d-2048: 8 bars resolution, bar inpainting, note density conditioning
- lakhclean_mmmtrack_4bars_chords: 4 bars resolution, chord conditioning
- lakhclean_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning (this model)
- lakhclean_mmmtrack_4bars_simple-2048: 4 bars resolution
- lakhclean_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
### Lakhfull
These models were trained on roundabout 175K MIDI files from the Lakh dataset.
- lakhfull_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning (the big brother of this model)
- lakhfull_mmmtrack_4bars_simple-2048: 4 bars resolution
### Metal
These models were trained on roundabout 7K MIDI files from my own collections. They contain genre conditioning.
- metal_mmmbar_4bars_d-2048: 4 bars resolution, bar inpainting, note density conditioning
- metal_mmmbar_8bars_d-2048: 8 bars resolution, bar inpainting, note density conditioning
- metal_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning
- metal_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
### MetaMIDI Dataset genres
These models were trained on genre-specific subsets of the MetaMIDI dataset.
- mmd-baroque_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning
- mmd-baroque_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-classical_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-noncontemporary_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-pop_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-renaissance_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
### MetaMIDI Dataset full
These models were trained on roundabout 400K MIDI files from the MetaMIDI dataset.
- mmd-full_mmmtrack_4bars_d-2048: 4 bars resolution, note density conditioning
- mmd-full_mmmtrack_8bars_d-2048: 8 bars resolution, note density conditioning
- mmd-full_mmmtrack_4bars_chords-d-2048: 4 bars resolution, note density conditioning, chord conditioning (most powerful model in the entire group)
## Intended uses & limitations
This model is just a proof of concept. It shows that HuggingFace can be used to compose music.
### How to use
There is a notebook in the repo that you can use to generate symbolic music and then render it.
### Limitations and bias
Since this model has been trained on a very small corpus of music, it is overfitting heavily.
### Acknowledgements
This model has been created with support from NVIDIA. I am very grateful for the GPU compute they provided! |