--- license: cc-by-4.0 --- # VidMuse ## VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling [TL;DR]: VidMuse is a framework for generating high-fidelity music aligned with video content, utilizing Long-Short-Term modeling, and has been accepted to CVPR 2025. ### Links [![arXiv](https://img.shields.io/badge/arXiv-2406.04321-brightgreen.svg?style=flat-square)](https://arxiv.org/pdf/2406.04321) [![GitHub.io](https://img.shields.io/badge/GitHub.io-Project-blue?logo=Github&style=flat-square)](https://vidmuse.github.io/) ## Clone the repository ```bash GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Zeyue7/VidMuse cd VidMuse ``` ## Usage 1. First install the [`VidMuse` library](https://github.com/ZeyueT/VidMuse) ``` conda create -n VidMuse python=3.9 conda activate VidMuse pip install git+https://github.com/ZeyueT/VidMuse.git ``` 2. Install ffmpeg: Install ffmpeg: ```bash sudo apt-get install ffmpeg # Or if you are using Anaconda or Miniconda conda install "ffmpeg<5" -c conda-forge ``` 3. Run the following Python code: ```py from video_processor import VideoProcessor, merge_video_audio from audiocraft.models import VidMuse import scipy # Path to the video video_path = 'sample.mp4' # Initialize the video processor processor = VideoProcessor() # Process the video to obtain tensors and duration local_video_tensor, global_video_tensor, duration = processor.process(video_path) progress = True USE_DIFFUSION = False # Load the pre-trained VidMuse model MODEL = VidMuse.get_pretrained('Zeyue7/VidMuse') # Set generation parameters for the model based on video duration MODEL.set_generation_params(duration=duration) try: # Generate outputs using the model outputs = MODEL.generate([local_video_tensor, global_video_tensor], progress=progress, return_tokens=USE_DIFFUSION) except RuntimeError as e: print(e) # Detach outputs from the computation graph and convert to CPU float tensor outputs = outputs.detach().cpu().float() sampling_rate = 32000 output_wav_path = "vidmuse_out.wav" # Write the output audio data to a WAV file scipy.io.wavfile.write(output_wav_path, rate=sampling_rate, data=outputs[0, 0].numpy()) output_video_path = "output_video.mp4" # Merge the original video with the generated music merge_video_audio(video_path, output_wav_path, output_video_path) ``` ## Citation If you find our work useful, please consider citing: ``` @article{tian2024vidmuse, title={Vidmuse: A simple video-to-music generation framework with long-short-term modeling}, author={Tian, Zeyue and Liu, Zhaoyang and Yuan, Ruibin and Pan, Jiahao and Liu, Qifeng and Tan, Xu and Chen, Qifeng and Xue, Wei and Guo, Yike}, journal={arXiv preprint arXiv:2406.04321}, year={2024} } ```