schnik's picture
Create README.md
f5ef93a verified
|
raw
history blame
2.42 kB
metadata
license: mit
language:
  - en
library_name: peft

Master Thesis: High-Fidelity Video Background Music Generation using Transformers

This is the corresponding GitLab Repository of my Master Thesis. The goal this thisis is to generate video background music by the adaptation of MusicGen (https://arxiv.org/pdf/2306.05284.pdf) to video input as another input modality. This should be accomplished by mapping video information into the T5 text embedding space on which MusicGen usually works on. To this end, a Transformer Encoder network to accomplish this task, called Video Encoder. Two options are foreseen within the training loop for the Video Encoder:

  • freezing the weights within the MusicGen Audio Decoder
  • adjusting the weights of the MusicGen Audio Decoder with Parameter Efficient Fine-Tuning (PEFT) using LoRA (https://arxiv.org/abs/2106.09685)

Installation

  • create a Python virtual environment with Python 3.11
  • check https://pytorch.org/get-started/previous-versions/ to install PyTorch 2.1.0 with CUDA on your machine
  • install the local fork of audiocraft: cd audiocraft; pip install -e .
  • install the other requirements: pip install -r requirements.txt

Folder Structure

  • audiocraft contains a local fork of the audiocraft library (https://github.com/facebookresearch/audiocraft) with little changes to the generation method, further information can be seen in code/code_adaptations_audiocraft.
  • code contains the code for model training and inference of video background music
  • datasets contains the code to create the datasets used for training within data_preparation and video examples used for the evaluation in example_videos
  • evaluation contains the code used to evaluate the datasets and created video embeddings
  • gradio_app contains the code for interface to generate video background music

Training

To train the models set the training parameters under training/training_conf.yml and start training with python training/training.py. The models weights will be stored under training/models_audiocraft or training/models_peft respectively.

Inference

  • start the user interface by running python gradio_app/app.py
  • inside the interface select a video, parameters
  • click on "submit" to start the generation

Contact

For any questions contact me at niklas.schulte@rwth-aachen.de