--- license: mit language: - en library_name: peft --- # Master Thesis: High-Fidelity Video Background Music Generation using Transformers This is the corresponding GitLab Repository of my Master Thesis. The goal this thisis is to generate video background music by the adaptation of MusicGen (https://arxiv.org/pdf/2306.05284.pdf) to video input as another input modality. This should be accomplished by mapping video information into the T5 text embedding space on which MusicGen usually works on. To this end, a Transformer Encoder network to accomplish this task, called Video Encoder. Two options are foreseen within the training loop for the Video Encoder: - freezing the weights within the MusicGen Audio Decoder - adjusting the weights of the MusicGen Audio Decoder with Parameter Efficient Fine-Tuning (PEFT) using LoRA (https://arxiv.org/abs/2106.09685) # Installation - create a Python virtual environment with `Python 3.11` - check https://pytorch.org/get-started/previous-versions/ to install `PyTorch 2.1.0` with `CUDA` on your machine - install the local fork of audiocraft: `cd audiocraft; pip install -e .` - install the other requirements: `pip install -r requirements.txt` # Folder Structure - `audiocraft` contains a local fork of the audiocraft library (https://github.com/facebookresearch/audiocraft) with little changes to the generation method, further information can be seen in `code/code_adaptations_audiocraft`. - `code` contains the code for model `training` and `inference` of video background music - `datasets` contains the code to create the datasets used for training within `data_preparation` and video examples used for the evaluation in `example_videos` - `evaluation` contains the code used to evaluate the datasets and created video embeddings - `gradio_app` contains the code for interface to generate video background music # Training To train the models set the training parameters under `training/training_conf.yml` and start training with `python training/training.py`. The models weights will be stored under `training/models_audiocraft` or `training/models_peft` respectively. # Inference - start the user interface by running `python gradio_app/app.py` - inside the interface select a video, parameters - click on "submit" to start the generation # Contact For any questions contact me at [niklas.schulte@rwth-aachen.de](mailto:niklas.schulte@rwth-aachen.de)