STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
Code: https://github.com/NJU-PCALab/STAR
Paper: https://arxiv.org/abs/2501.02976
Project Page: https://nju-pcalab.github.io/projects/STAR
Demo Video: https://youtu.be/hx0zrql-SrU
βοΈ Dependencies and Installation
## git clone this repository
git clone https://github.com/NJU-PCALab/STAR.git
cd STAR
## create an environment
conda create -n star python=3.10
conda activate star
pip install -r requirements.txt
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
π Inference
Model Weight
Base Model | Type | URL |
---|---|---|
I2VGen-XL | Light Degradation | :link: |
I2VGen-XL | Heavy Degradation | :link: |
CogVideoX-5B | Heavy Degradation | :link: |
1. I2VGen-XL-based
Step 1: Download the pretrained model STAR from HuggingFace.
We provide two verisions for I2VGen-XL-based model, heavy_deg.pt
for heavy degraded videos and light_deg.pt
for light degraded videos (e.g., the low-resolution video downloaded from video websites).
You can put the weight into pretrained_weight/
.
Step 2: Prepare testing data
You can put the testing videos in the input/video/
.
As for the prompt, there are three options: 1. No prompt. 2. Automatically generate a prompt using Pllava. 3. Manually write the prompt. You can put the txt file in the input/text/
.
Step 3: Change the path
You need to change the paths in video_super_resolution/scripts/inference_sr.sh
to your local corresponding paths, including video_folder_path
, txt_file_path
, model_path
, and save_dir
.
Step 4: Running inference command
bash video_super_resolution/scripts/inference_sr.sh
If you encounter an OOM problem, you can set a smaller frame_length
in inference_sr.sh
.
2. CogVideoX-based
Refer to these instructions for inference with the CogVideX-5B-based model.
Please note that the CogVideX-5B-based model supports only 720x480 input.
Model tree for SherryX/STAR
Base model
THUDM/CogVideoX-5b