English
music
music-captioning
File size: 2,063 Bytes
5492525
 
31b2e83
 
 
 
 
 
 
 
 
 
 
5492525
31b2e83
 
a96e517
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: mit
datasets:
- seungheondoh/LP-MusicCaps-MSD
- seungheondoh/LP-MusicCaps-MC
language:
- en
metrics:
- bleu
- bertscore
tags:
- music
- music-captioning
---

- **Repository:** [LP-MusicCaps repository](https://github.com/seungheondoh/lp-music-caps)
- **Paper:** [ArXiv](https://arxiv.org/abs/2307.16372)

# :sound: LP-MusicCaps: LLM-Based Pseudo Music Captioning

[![Demo Video](https://i.imgur.com/cgi8NsD.jpg)](https://youtu.be/ezwYVaiC-AM)

This is a implementation of [LP-MusicCaps: LLM-Based Pseudo Music Captioning](#). This project aims to generate captions for music. 1) Tag-to-Caption: Using existing tags, We leverage the power of OpenAI's GPT-3.5 Turbo API to generate high-quality and contextually relevant captions based on music tag. 2) Audio-to-Caption: Using music-audio and pseudo caption pairs, we train a cross-model encoder-decoder model for end-to-end music captioning

> [**LP-MusicCaps: LLM-Based Pseudo Music Captioning**](#)   
> SeungHeon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam   
> To appear ISMIR 2023   


## TL;DR


<p align = "center">
<img src = "https://i.imgur.com/2LC0nT1.png">
</p>

- **[1.Tag-to-Caption: LLM Captioning](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/llm_captioning)**: Generate caption from given tag input.
- **[2.Pretrain Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning)**: Generate pseudo caption from given audio.
- **[3.Transfer Music Captioning Model](https://github.com/seungheondoh/lp-music-caps/tree/main/lpmc/music_captioning/transfer.py)**: Generate human level caption from given audio.

## Open Source Material

- [pre-trained models](https://huggingface.co/seungheondoh/lp-music-caps) 
- [music-pseudo caption dataset](https://huggingface.co/datasets/seungheondoh/LP-MusicCaps-MSD)
- [demo](https://huggingface.co/spaces/seungheondoh/LP-Music-Caps-demo) 

are available online for future research. example of dataset in [notebook](https://github.com/seungheondoh/lp-music-caps/blob/main/notebook/Dataset.ipynb)