Spaces:
Configuration error
Configuration error
Commit
·
dcd1471
1
Parent(s):
1b5d3d5
update
Browse files
README.md
CHANGED
@@ -90,22 +90,41 @@ Then extract them into ./annotator/ckpts
|
|
90 |
|
91 |
</details>
|
92 |
|
93 |
-
##
|
94 |
|
|
|
|
|
95 |
```
|
96 |
gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
|
97 |
tar -zxvf videograin_data.tar.gz
|
98 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
## 🔥 VideoGrain Editing
|
101 |
|
102 |
### Inference
|
103 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
```bash
|
106 |
bash test.sh
|
107 |
#or
|
108 |
-
CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config
|
109 |
```
|
110 |
|
111 |
<details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
|
@@ -125,7 +144,10 @@ result
|
|
125 |
│ ├── sd_study # cluster inversion feature
|
126 |
```
|
127 |
</details>
|
|
|
|
|
128 |
|
|
|
129 |
|
130 |
## ✏️ Citation
|
131 |
If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
|
|
|
90 |
|
91 |
</details>
|
92 |
|
93 |
+
## ⚡️ Prepare all the data
|
94 |
|
95 |
+
### Provided data
|
96 |
+
We have provided `all the video data and layout masks in VideoGrain` at following link. Please download unzip the data and put them in the `./data' root directory.
|
97 |
```
|
98 |
gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
|
99 |
tar -zxvf videograin_data.tar.gz
|
100 |
```
|
101 |
+
### Customize your own data
|
102 |
+
**prepare video to frames**
|
103 |
+
If the input video is mp4 file, using the following command to process it to frames:
|
104 |
+
```bash
|
105 |
+
python image_util/sample_video2frames.py --video_path 'your video path' --output_dir './data/video_name'
|
106 |
+
```
|
107 |
+
**prepare layout masks**
|
108 |
+
We segment videos using our ReLER lab's [SAM-Track](https://github.com/z-x-yang/Segment-and-Track-Anything). I suggest using the `app.py` in SAM-Track for `graio` mode to manually select which region in the video your want to edit. Here, we also provided an script ` image_util/process_webui_mask.py` to process masks from SAM-Track path to VideoGrain path.
|
109 |
+
|
110 |
|
111 |
## 🔥 VideoGrain Editing
|
112 |
|
113 |
### Inference
|
114 |
+
**prepare config**
|
115 |
+
VideoGrain is a training-free framework. To run VideoGrain, please prepare your config follow these steps:
|
116 |
+
- 1. Replace your pretrained model path and controlnet path in your config. you can change the control_type to `dwpose` or `depth_zoe` or `depth` (midas).
|
117 |
+
- 2. Prepare your video frames and layout masks (edit regions) using SAM-Track or SAM2 in dataset config.
|
118 |
+
- 3. Change the `prompt`, and extract each `local prompt` in the editing prompts. the local prompt order should be same as layout masks order.
|
119 |
+
- 4. Your can change flatten resolution with 1->64, 2->16, 4->8. (commonly, flatten at 64 worked best)
|
120 |
+
- 5. To ensure temporal consistency, you can set `use_pnp: True` and `inject_step:5-10`. (Note that pnp>10 steps will be bad for multi-regions editing)
|
121 |
+
- 6. If you want to visualize the cross attn weight, set `vis_cross_attn: True`
|
122 |
+
- 7. If you want to cluster DDIM Inversion spatial temporal video feature, set `cluster_inversion_feature: True`
|
123 |
|
124 |
```bash
|
125 |
bash test.sh
|
126 |
#or
|
127 |
+
CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config /path/to/the/config
|
128 |
```
|
129 |
|
130 |
<details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
|
|
|
144 |
│ ├── sd_study # cluster inversion feature
|
145 |
```
|
146 |
</details>
|
147 |
+
Editing 16 frames video on an single L40, the GPU memory cost is at most 23GB memory. The RAM cost is very small, roughly around 4GB.
|
148 |
+
|
149 |
|
150 |
+
## Instance-level Video Editing
|
151 |
|
152 |
## ✏️ Citation
|
153 |
If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
|
config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml
CHANGED
@@ -34,7 +34,7 @@ editing_config:
|
|
34 |
sample_seeds: [0]
|
35 |
num_inference_steps: 50
|
36 |
blending_percentage: 0
|
37 |
-
cluster_inversion_feature: True
|
38 |
# vis_cross_attn: false
|
39 |
|
40 |
test_pipeline_config:
|
|
|
34 |
sample_seeds: [0]
|
35 |
num_inference_steps: 50
|
36 |
blending_percentage: 0
|
37 |
+
#cluster_inversion_feature: True
|
38 |
# vis_cross_attn: false
|
39 |
|
40 |
test_pipeline_config:
|
image_util/process_webui_mask.py
ADDED
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import cv2
|
3 |
+
import numpy as np
|
4 |
+
import argparse
|
5 |
+
from PIL import Image
|
6 |
+
|
7 |
+
def convert_and_copy_masks(src_folder, dest_folder, index_offset, start_frame, end_frame):
|
8 |
+
# Ensure that the destination folder exists
|
9 |
+
if not os.path.exists(dest_folder):
|
10 |
+
os.makedirs(dest_folder)
|
11 |
+
|
12 |
+
# Iterate over all files in the source folder
|
13 |
+
for filename in os.listdir(src_folder):
|
14 |
+
if filename.endswith(".png"): # Process only PNG files
|
15 |
+
# Extract the numeric part of the file name
|
16 |
+
file_index = int(filename.split('.')[0])
|
17 |
+
|
18 |
+
# Check if the file index is within the specified frame range
|
19 |
+
if start_frame <= file_index <= end_frame:
|
20 |
+
# Calculate the new index
|
21 |
+
new_index = file_index + index_offset
|
22 |
+
# Construct the new file name
|
23 |
+
new_filename = f"{new_index:05d}.png"
|
24 |
+
|
25 |
+
# Read the original mask image (grayscale)
|
26 |
+
src_path = os.path.join(src_folder, filename)
|
27 |
+
mask = cv2.imread(src_path, cv2.IMREAD_GRAYSCALE)
|
28 |
+
|
29 |
+
# Apply thresholding to convert the mask to 0 and 1
|
30 |
+
_, mask = cv2.threshold(mask, 0.5, 1, cv2.THRESH_BINARY)
|
31 |
+
|
32 |
+
# Convert mask value range to 0-255
|
33 |
+
mask = (mask * 255).astype(np.uint8)
|
34 |
+
|
35 |
+
# Save the converted mask to the destination folder
|
36 |
+
dest_path = os.path.join(dest_folder, new_filename)
|
37 |
+
cv2.imwrite(dest_path, mask)
|
38 |
+
|
39 |
+
if __name__ == "__main__":
|
40 |
+
parser = argparse.ArgumentParser(description="Convert and copy mask images with index adjustment")
|
41 |
+
parser.add_argument('--src_folder', type=str, required=True, help="Path to the source folder containing mask images")
|
42 |
+
parser.add_argument('--dest_folder', type=str, required=True, help="Path to the destination folder for converted masks")
|
43 |
+
parser.add_argument('--index_offset', type=int, default=0, help="Index offset to apply to file names")
|
44 |
+
parser.add_argument('--start_frame', type=int, default=0, help="Start frame number")
|
45 |
+
parser.add_argument('--end_frame', type=int, required=True, help="End frame number")
|
46 |
+
|
47 |
+
args = parser.parse_args()
|
48 |
+
|
49 |
+
convert_and_copy_masks(args.src_folder, args.dest_folder, args.index_offset, args.start_frame, args.end_frame)
|
50 |
+
print("Mask conversion and copying completed")
|
image_util/sample_frames2video.py
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import imageio
|
2 |
+
import os
|
3 |
+
|
4 |
+
def images_to_video(image_folder, output_video_name, start_frame=0, end_frame=None, sample_rate=1, fps=10):
|
5 |
+
# Get all images and sort them by file name
|
6 |
+
filenames = sorted([os.path.join(image_folder, image) for image in os.listdir(image_folder) if image.endswith(".png") or image.endswith(".jpg")])
|
7 |
+
|
8 |
+
# Ensure that images were found
|
9 |
+
if not filenames:
|
10 |
+
raise ValueError("No images found in the specified directory!")
|
11 |
+
|
12 |
+
# If end_frame is not specified, default to the last image
|
13 |
+
if end_frame is None or end_frame > len(filenames):
|
14 |
+
end_frame = len(filenames)
|
15 |
+
|
16 |
+
# Select images based on start_frame, end_frame, and sample_rate
|
17 |
+
selected_filenames = filenames[start_frame:end_frame:sample_rate]
|
18 |
+
|
19 |
+
# Ensure that some images have been selected
|
20 |
+
if not selected_filenames:
|
21 |
+
raise ValueError("No images selected based on the provided range and sample rate!")
|
22 |
+
|
23 |
+
# Read the selected images
|
24 |
+
images = [imageio.imread(filename) for filename in selected_filenames]
|
25 |
+
|
26 |
+
# Write the video file
|
27 |
+
imageio.mimwrite(output_video_name, images, fps=fps)
|
28 |
+
|
29 |
+
print(f"Video created successfully and saved to {output_video_name}")
|
30 |
+
|
31 |
+
if __name__ == "__main__":
|
32 |
+
source_image_folder = '' # Replace with the path to your original image folder
|
33 |
+
output_video_name = '' # The desired output video name
|
34 |
+
|
35 |
+
# Specify the start frame, end frame, and sample rate
|
36 |
+
start_frame = 0 # Starting frame
|
37 |
+
end_frame = 15 # Ending frame, adjust as needed
|
38 |
+
sample_rate = 1 # Frame sampling rate
|
39 |
+
fps = 10 # Frames per second
|
40 |
+
|
41 |
+
# Create the video
|
42 |
+
images_to_video(source_image_folder, output_video_name, start_frame=start_frame, end_frame=end_frame, sample_rate=sample_rate, fps=fps)
|
image_util/sample_video2frames.py
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import cv2
|
2 |
+
import os
|
3 |
+
import argparse
|
4 |
+
|
5 |
+
def extract_frames(video_path, output_dir):
|
6 |
+
# Create the output directory if it doesn't exist
|
7 |
+
if not os.path.exists(output_dir):
|
8 |
+
os.makedirs(output_dir)
|
9 |
+
print(f"Output directory {output_dir} created.")
|
10 |
+
else:
|
11 |
+
print(f"Output directory {output_dir} already exists.")
|
12 |
+
|
13 |
+
# Open the video file
|
14 |
+
cap = cv2.VideoCapture(video_path)
|
15 |
+
|
16 |
+
# Check if the video opened successfully
|
17 |
+
if not cap.isOpened():
|
18 |
+
print(f"Error: The video file at {video_path} could not be opened.")
|
19 |
+
exit()
|
20 |
+
|
21 |
+
# Initialize frame count
|
22 |
+
frame_count = 0
|
23 |
+
|
24 |
+
# Read until video is completed
|
25 |
+
while cap.isOpened():
|
26 |
+
# Capture frame-by-frame
|
27 |
+
ret, frame = cap.read()
|
28 |
+
|
29 |
+
# If frame is read correctly, ret is True
|
30 |
+
if not ret:
|
31 |
+
print(f"Error reading frame {frame_count}. Stopping capture.")
|
32 |
+
break
|
33 |
+
|
34 |
+
# Check if frame is None
|
35 |
+
if frame is None:
|
36 |
+
print(f"Frame {frame_count} is None. Stopping capture.")
|
37 |
+
break
|
38 |
+
|
39 |
+
# Save each frame to output directory
|
40 |
+
output_path = os.path.join(output_dir, f'{frame_count:05d}.jpg')
|
41 |
+
cv2.imwrite(output_path, frame)
|
42 |
+
print(f"Saved frame {frame_count} to {output_path}")
|
43 |
+
|
44 |
+
frame_count += 1
|
45 |
+
|
46 |
+
# When everything is done, release the video capture object
|
47 |
+
cap.release()
|
48 |
+
print(f"All frames ({frame_count} frames) are saved successfully in {output_dir}.")
|
49 |
+
|
50 |
+
if __name__ == "__main__":
|
51 |
+
parser = argparse.ArgumentParser(description="Extract frames from a video and save them as images.")
|
52 |
+
parser.add_argument("--video_path", type=str, required=True, help="Path to the input video file")
|
53 |
+
parser.add_argument("--output_dir", type=str, required=True, help="Directory where the extracted frames will be saved")
|
54 |
+
|
55 |
+
args = parser.parse_args()
|
56 |
+
extract_frames(args.video_path, args.output_dir)
|