XiangpengYang commited on
Commit
dcd1471
·
1 Parent(s): 1b5d3d5
README.md CHANGED
@@ -90,22 +90,41 @@ Then extract them into ./annotator/ckpts
90
 
91
  </details>
92
 
93
- ## 🔛 Prepare all the data
94
 
 
 
95
  ```
96
  gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
97
  tar -zxvf videograin_data.tar.gz
98
  ```
 
 
 
 
 
 
 
 
 
99
 
100
  ## 🔥 VideoGrain Editing
101
 
102
  ### Inference
103
- VideoGrain is a training-free framework. To run the inference script, use the following command:
 
 
 
 
 
 
 
 
104
 
105
  ```bash
106
  bash test.sh
107
  #or
108
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml
109
  ```
110
 
111
  <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
@@ -125,7 +144,10 @@ result
125
  │ ├── sd_study # cluster inversion feature
126
  ```
127
  </details>
 
 
128
 
 
129
 
130
  ## ✏️ Citation
131
  If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
 
90
 
91
  </details>
92
 
93
+ ## ⚡️ Prepare all the data
94
 
95
+ ### Provided data
96
+ We have provided `all the video data and layout masks in VideoGrain` at following link. Please download unzip the data and put them in the `./data' root directory.
97
  ```
98
  gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
99
  tar -zxvf videograin_data.tar.gz
100
  ```
101
+ ### Customize your own data
102
+ **prepare video to frames**
103
+ If the input video is mp4 file, using the following command to process it to frames:
104
+ ```bash
105
+ python image_util/sample_video2frames.py --video_path 'your video path' --output_dir './data/video_name'
106
+ ```
107
+ **prepare layout masks**
108
+ We segment videos using our ReLER lab's [SAM-Track](https://github.com/z-x-yang/Segment-and-Track-Anything). I suggest using the `app.py` in SAM-Track for `graio` mode to manually select which region in the video your want to edit. Here, we also provided an script ` image_util/process_webui_mask.py` to process masks from SAM-Track path to VideoGrain path.
109
+
110
 
111
  ## 🔥 VideoGrain Editing
112
 
113
  ### Inference
114
+ **prepare config**
115
+ VideoGrain is a training-free framework. To run VideoGrain, please prepare your config follow these steps:
116
+ - 1. Replace your pretrained model path and controlnet path in your config. you can change the control_type to `dwpose` or `depth_zoe` or `depth` (midas).
117
+ - 2. Prepare your video frames and layout masks (edit regions) using SAM-Track or SAM2 in dataset config.
118
+ - 3. Change the `prompt`, and extract each `local prompt` in the editing prompts. the local prompt order should be same as layout masks order.
119
+ - 4. Your can change flatten resolution with 1->64, 2->16, 4->8. (commonly, flatten at 64 worked best)
120
+ - 5. To ensure temporal consistency, you can set `use_pnp: True` and `inject_step:5-10`. (Note that pnp>10 steps will be bad for multi-regions editing)
121
+ - 6. If you want to visualize the cross attn weight, set `vis_cross_attn: True`
122
+ - 7. If you want to cluster DDIM Inversion spatial temporal video feature, set `cluster_inversion_feature: True`
123
 
124
  ```bash
125
  bash test.sh
126
  #or
127
+ CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config /path/to/the/config
128
  ```
129
 
130
  <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
 
144
  │ ├── sd_study # cluster inversion feature
145
  ```
146
  </details>
147
+ Editing 16 frames video on an single L40, the GPU memory cost is at most 23GB memory. The RAM cost is very small, roughly around 4GB.
148
+
149
 
150
+ ## Instance-level Video Editing
151
 
152
  ## ✏️ Citation
153
  If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml CHANGED
@@ -34,7 +34,7 @@ editing_config:
34
  sample_seeds: [0]
35
  num_inference_steps: 50
36
  blending_percentage: 0
37
- cluster_inversion_feature: True
38
  # vis_cross_attn: false
39
 
40
  test_pipeline_config:
 
34
  sample_seeds: [0]
35
  num_inference_steps: 50
36
  blending_percentage: 0
37
+ #cluster_inversion_feature: True
38
  # vis_cross_attn: false
39
 
40
  test_pipeline_config:
image_util/process_webui_mask.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ import numpy as np
4
+ import argparse
5
+ from PIL import Image
6
+
7
+ def convert_and_copy_masks(src_folder, dest_folder, index_offset, start_frame, end_frame):
8
+ # Ensure that the destination folder exists
9
+ if not os.path.exists(dest_folder):
10
+ os.makedirs(dest_folder)
11
+
12
+ # Iterate over all files in the source folder
13
+ for filename in os.listdir(src_folder):
14
+ if filename.endswith(".png"): # Process only PNG files
15
+ # Extract the numeric part of the file name
16
+ file_index = int(filename.split('.')[0])
17
+
18
+ # Check if the file index is within the specified frame range
19
+ if start_frame <= file_index <= end_frame:
20
+ # Calculate the new index
21
+ new_index = file_index + index_offset
22
+ # Construct the new file name
23
+ new_filename = f"{new_index:05d}.png"
24
+
25
+ # Read the original mask image (grayscale)
26
+ src_path = os.path.join(src_folder, filename)
27
+ mask = cv2.imread(src_path, cv2.IMREAD_GRAYSCALE)
28
+
29
+ # Apply thresholding to convert the mask to 0 and 1
30
+ _, mask = cv2.threshold(mask, 0.5, 1, cv2.THRESH_BINARY)
31
+
32
+ # Convert mask value range to 0-255
33
+ mask = (mask * 255).astype(np.uint8)
34
+
35
+ # Save the converted mask to the destination folder
36
+ dest_path = os.path.join(dest_folder, new_filename)
37
+ cv2.imwrite(dest_path, mask)
38
+
39
+ if __name__ == "__main__":
40
+ parser = argparse.ArgumentParser(description="Convert and copy mask images with index adjustment")
41
+ parser.add_argument('--src_folder', type=str, required=True, help="Path to the source folder containing mask images")
42
+ parser.add_argument('--dest_folder', type=str, required=True, help="Path to the destination folder for converted masks")
43
+ parser.add_argument('--index_offset', type=int, default=0, help="Index offset to apply to file names")
44
+ parser.add_argument('--start_frame', type=int, default=0, help="Start frame number")
45
+ parser.add_argument('--end_frame', type=int, required=True, help="End frame number")
46
+
47
+ args = parser.parse_args()
48
+
49
+ convert_and_copy_masks(args.src_folder, args.dest_folder, args.index_offset, args.start_frame, args.end_frame)
50
+ print("Mask conversion and copying completed")
image_util/sample_frames2video.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import imageio
2
+ import os
3
+
4
+ def images_to_video(image_folder, output_video_name, start_frame=0, end_frame=None, sample_rate=1, fps=10):
5
+ # Get all images and sort them by file name
6
+ filenames = sorted([os.path.join(image_folder, image) for image in os.listdir(image_folder) if image.endswith(".png") or image.endswith(".jpg")])
7
+
8
+ # Ensure that images were found
9
+ if not filenames:
10
+ raise ValueError("No images found in the specified directory!")
11
+
12
+ # If end_frame is not specified, default to the last image
13
+ if end_frame is None or end_frame > len(filenames):
14
+ end_frame = len(filenames)
15
+
16
+ # Select images based on start_frame, end_frame, and sample_rate
17
+ selected_filenames = filenames[start_frame:end_frame:sample_rate]
18
+
19
+ # Ensure that some images have been selected
20
+ if not selected_filenames:
21
+ raise ValueError("No images selected based on the provided range and sample rate!")
22
+
23
+ # Read the selected images
24
+ images = [imageio.imread(filename) for filename in selected_filenames]
25
+
26
+ # Write the video file
27
+ imageio.mimwrite(output_video_name, images, fps=fps)
28
+
29
+ print(f"Video created successfully and saved to {output_video_name}")
30
+
31
+ if __name__ == "__main__":
32
+ source_image_folder = '' # Replace with the path to your original image folder
33
+ output_video_name = '' # The desired output video name
34
+
35
+ # Specify the start frame, end frame, and sample rate
36
+ start_frame = 0 # Starting frame
37
+ end_frame = 15 # Ending frame, adjust as needed
38
+ sample_rate = 1 # Frame sampling rate
39
+ fps = 10 # Frames per second
40
+
41
+ # Create the video
42
+ images_to_video(source_image_folder, output_video_name, start_frame=start_frame, end_frame=end_frame, sample_rate=sample_rate, fps=fps)
image_util/sample_video2frames.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import os
3
+ import argparse
4
+
5
+ def extract_frames(video_path, output_dir):
6
+ # Create the output directory if it doesn't exist
7
+ if not os.path.exists(output_dir):
8
+ os.makedirs(output_dir)
9
+ print(f"Output directory {output_dir} created.")
10
+ else:
11
+ print(f"Output directory {output_dir} already exists.")
12
+
13
+ # Open the video file
14
+ cap = cv2.VideoCapture(video_path)
15
+
16
+ # Check if the video opened successfully
17
+ if not cap.isOpened():
18
+ print(f"Error: The video file at {video_path} could not be opened.")
19
+ exit()
20
+
21
+ # Initialize frame count
22
+ frame_count = 0
23
+
24
+ # Read until video is completed
25
+ while cap.isOpened():
26
+ # Capture frame-by-frame
27
+ ret, frame = cap.read()
28
+
29
+ # If frame is read correctly, ret is True
30
+ if not ret:
31
+ print(f"Error reading frame {frame_count}. Stopping capture.")
32
+ break
33
+
34
+ # Check if frame is None
35
+ if frame is None:
36
+ print(f"Frame {frame_count} is None. Stopping capture.")
37
+ break
38
+
39
+ # Save each frame to output directory
40
+ output_path = os.path.join(output_dir, f'{frame_count:05d}.jpg')
41
+ cv2.imwrite(output_path, frame)
42
+ print(f"Saved frame {frame_count} to {output_path}")
43
+
44
+ frame_count += 1
45
+
46
+ # When everything is done, release the video capture object
47
+ cap.release()
48
+ print(f"All frames ({frame_count} frames) are saved successfully in {output_dir}.")
49
+
50
+ if __name__ == "__main__":
51
+ parser = argparse.ArgumentParser(description="Extract frames from a video and save them as images.")
52
+ parser.add_argument("--video_path", type=str, required=True, help="Path to the input video file")
53
+ parser.add_argument("--output_dir", type=str, required=True, help="Directory where the extracted frames will be saved")
54
+
55
+ args = parser.parse_args()
56
+ extract_frames(args.video_path, args.output_dir)