Spaces:

XiangpengYang
/

VideoGrain

Configuration error

App Files Files Community

XiangpengYang commited on Mar 3

Commit

dcd1471

1 Parent(s): 1b5d3d5

update

Browse files

Files changed (5) hide show

README.md +25 -3
config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml +1 -1
image_util/process_webui_mask.py +50 -0
image_util/sample_frames2video.py +42 -0
image_util/sample_video2frames.py +56 -0

README.md CHANGED Viewed

@@ -90,22 +90,41 @@ Then extract them into ./annotator/ckpts
 </details>
-## 🔛 Prepare all the data
 ```
 gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
 tar -zxvf videograin_data.tar.gz
 ```
 ## 🔥 VideoGrain Editing
 ### Inference
-VideoGrain is a training-free framework. To run the inference script, use the following command:
 ```bash
 bash test.sh
 #or
-CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml
 ```
 <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
@@ -125,7 +144,10 @@ result
 │           ├── sd_study                # cluster inversion feature
 ```
 </details>
 ## ✏️ Citation
 If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:

 </details>
+## ⚡️ Prepare all the data
+### Provided data
+We have provided `all the video data and layout masks in VideoGrain` at following link. Please download unzip the data and put them in the `./data' root directory.
 ```
 gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
 tar -zxvf videograin_data.tar.gz
 ```
+### Customize your own data
+**prepare video to frames**
+If the input video is mp4 file, using the following command to process it to frames:
+```bash
+python image_util/sample_video2frames.py --video_path 'your video path' --output_dir './data/video_name'
+```
+**prepare layout masks**
+We segment videos using our ReLER lab's [SAM-Track](https://github.com/z-x-yang/Segment-and-Track-Anything). I suggest using the `app.py` in SAM-Track for `graio` mode to manually select which region in the video your want to edit. Here, we also provided an script ` image_util/process_webui_mask.py` to process masks from SAM-Track path to VideoGrain path.
 ## 🔥 VideoGrain Editing
 ### Inference
+**prepare config**
+VideoGrain is a training-free framework. To run VideoGrain, please prepare your config follow these steps:
+- 1. Replace your pretrained model path and controlnet path in your config. you can change the control_type to `dwpose` or `depth_zoe` or `depth` (midas).
+- 2. Prepare your video frames and layout masks (edit regions) using SAM-Track or SAM2 in dataset config.
+- 3. Change the `prompt`, and extract each `local prompt` in the editing prompts. the local prompt order should be same as layout masks order.
+- 4. Your can change flatten resolution with 1->64, 2->16, 4->8. (commonly, flatten at 64 worked best)
+- 5. To ensure temporal consistency, you can set `use_pnp: True` and `inject_step:5-10`. (Note that pnp>10 steps will be bad for multi-regions editing)
+- 6. If you want to visualize the cross attn weight, set `vis_cross_attn: True`
+- 7. If you want to cluster DDIM Inversion spatial temporal video feature, set `cluster_inversion_feature: True`
 ```bash
 bash test.sh
 #or
+CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config  /path/to/the/config
 ```
 <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
 │           ├── sd_study                # cluster inversion feature
 ```
 </details>
+Editing 16 frames video on an single L40, the GPU memory cost is at most 23GB memory. The RAM cost is very small, roughly around 4GB.
+## Instance-level Video Editing
 ## ✏️ Citation
 If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:

config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml CHANGED Viewed

@@ -34,7 +34,7 @@ editing_config:
     sample_seeds: [0]
     num_inference_steps: 50
     blending_percentage: 0
-    cluster_inversion_feature: True
     # vis_cross_attn: false
 test_pipeline_config:

     sample_seeds: [0]
     num_inference_steps: 50
     blending_percentage: 0
+    #cluster_inversion_feature: True
     # vis_cross_attn: false
 test_pipeline_config:

image_util/process_webui_mask.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import os
+import cv2
+import numpy as np
+import argparse
+from PIL import Image
+def convert_and_copy_masks(src_folder, dest_folder, index_offset, start_frame, end_frame):
+    # Ensure that the destination folder exists
+    if not os.path.exists(dest_folder):
+        os.makedirs(dest_folder)
+    # Iterate over all files in the source folder
+    for filename in os.listdir(src_folder):
+        if filename.endswith(".png"):  # Process only PNG files
+            # Extract the numeric part of the file name
+            file_index = int(filename.split('.')[0])
+            # Check if the file index is within the specified frame range
+            if start_frame <= file_index <= end_frame:
+                # Calculate the new index
+                new_index = file_index + index_offset
+                # Construct the new file name
+                new_filename = f"{new_index:05d}.png"
+                # Read the original mask image (grayscale)
+                src_path = os.path.join(src_folder, filename)
+                mask = cv2.imread(src_path, cv2.IMREAD_GRAYSCALE)
+                # Apply thresholding to convert the mask to 0 and 1
+                _, mask = cv2.threshold(mask, 0.5, 1, cv2.THRESH_BINARY)
+                # Convert mask value range to 0-255
+                mask = (mask * 255).astype(np.uint8)
+                # Save the converted mask to the destination folder
+                dest_path = os.path.join(dest_folder, new_filename)
+                cv2.imwrite(dest_path, mask)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Convert and copy mask images with index adjustment")
+    parser.add_argument('--src_folder', type=str, required=True, help="Path to the source folder containing mask images")
+    parser.add_argument('--dest_folder', type=str, required=True, help="Path to the destination folder for converted masks")
+    parser.add_argument('--index_offset', type=int, default=0, help="Index offset to apply to file names")
+    parser.add_argument('--start_frame', type=int, default=0, help="Start frame number")
+    parser.add_argument('--end_frame', type=int, required=True, help="End frame number")
+    args = parser.parse_args()
+    convert_and_copy_masks(args.src_folder, args.dest_folder, args.index_offset, args.start_frame, args.end_frame)
+    print("Mask conversion and copying completed")

image_util/sample_frames2video.py ADDED Viewed

	@@ -0,0 +1,42 @@

+import imageio
+import os
+def images_to_video(image_folder, output_video_name, start_frame=0, end_frame=None, sample_rate=1, fps=10):
+    # Get all images and sort them by file name
+    filenames = sorted([os.path.join(image_folder, image) for image in os.listdir(image_folder) if image.endswith(".png") or image.endswith(".jpg")])
+    # Ensure that images were found
+    if not filenames:
+        raise ValueError("No images found in the specified directory!")
+    # If end_frame is not specified, default to the last image
+    if end_frame is None or end_frame > len(filenames):
+        end_frame = len(filenames)
+    # Select images based on start_frame, end_frame, and sample_rate
+    selected_filenames = filenames[start_frame:end_frame:sample_rate]
+    # Ensure that some images have been selected
+    if not selected_filenames:
+        raise ValueError("No images selected based on the provided range and sample rate!")
+    # Read the selected images
+    images = [imageio.imread(filename) for filename in selected_filenames]
+    # Write the video file
+    imageio.mimwrite(output_video_name, images, fps=fps)
+    print(f"Video created successfully and saved to {output_video_name}")
+if __name__ == "__main__":
+    source_image_folder = ''  # Replace with the path to your original image folder
+    output_video_name = ''  # The desired output video name
+    # Specify the start frame, end frame, and sample rate
+    start_frame = 0  # Starting frame
+    end_frame = 15  # Ending frame, adjust as needed
+    sample_rate = 1  # Frame sampling rate
+    fps = 10  # Frames per second
+    # Create the video
+    images_to_video(source_image_folder, output_video_name, start_frame=start_frame, end_frame=end_frame, sample_rate=sample_rate, fps=fps)

image_util/sample_video2frames.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import cv2
+import os
+import argparse
+def extract_frames(video_path, output_dir):
+    # Create the output directory if it doesn't exist
+    if not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+        print(f"Output directory {output_dir} created.")
+    else:
+        print(f"Output directory {output_dir} already exists.")
+    # Open the video file
+    cap = cv2.VideoCapture(video_path)
+    # Check if the video opened successfully
+    if not cap.isOpened():
+        print(f"Error: The video file at {video_path} could not be opened.")
+        exit()
+    # Initialize frame count
+    frame_count = 0
+    # Read until video is completed
+    while cap.isOpened():
+        # Capture frame-by-frame
+        ret, frame = cap.read()
+        # If frame is read correctly, ret is True
+        if not ret:
+            print(f"Error reading frame {frame_count}. Stopping capture.")
+            break
+        # Check if frame is None
+        if frame is None:
+            print(f"Frame {frame_count} is None. Stopping capture.")
+            break
+        # Save each frame to output directory
+        output_path = os.path.join(output_dir, f'{frame_count:05d}.jpg')
+        cv2.imwrite(output_path, frame)
+        print(f"Saved frame {frame_count} to {output_path}")
+        frame_count += 1
+    # When everything is done, release the video capture object
+    cap.release()
+    print(f"All frames ({frame_count} frames) are saved successfully in {output_dir}.")
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Extract frames from a video and save them as images.")
+    parser.add_argument("--video_path", type=str, required=True, help="Path to the input video file")
+    parser.add_argument("--output_dir", type=str, required=True, help="Directory where the extracted frames will be saved")
+    args = parser.parse_args()
+    extract_frames(args.video_path, args.output_dir)