benjamin-paine commited on
Commit
03e71d4
·
verified ·
1 Parent(s): 5725a7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +240 -32
README.md CHANGED
@@ -1,39 +1,35 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- This repository contains a pruned and partially reorganized version of [CHAMP](https://fudan-generative-vision.github.io/champ/#/).
5
 
6
  ```
7
- @misc{zhu2024champ,
8
- title={Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance},
9
- author={Shenhao Zhu and Junming Leo Chen and Zuozhuo Dai and Yinghui Xu and Xun Cao and Yao Yao and Hao Zhu and Siyu Zhu},
10
  year={2024},
11
- eprint={2403.14781},
12
  archivePrefix={arXiv},
13
  primaryClass={cs.CV}
14
  }
15
  ```
16
 
17
- <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/64429aaf7feb866811b12f73/wZku1I_4L4VwWeXXKgXqb.mp4"></video>
18
-
19
- Video credit: [Polina Tankilevitch, Pexels](https://www.pexels.com/video/a-young-woman-dancing-hip-hop-3873100/)
20
-
21
- Image credit: [Andrea Piacquadio, Pexels](https://www.pexels.com/photo/man-in-black-jacket-wearing-black-headphones-3831645/)
22
-
23
  # Usage
24
 
25
- First, install the CHAMP package into your python environment. If you're creating a new environment for CHAMP, be sure you also specify the version of torch you want with CUDA support, or else this will try to run only on CPU.
 
 
26
 
27
  ```sh
28
- pip install git+https://github.com/painebenjamin/champ.git
29
  ```
30
 
31
  Now, you can create the pipeline, automatically pulling the weights from this repository, either as individual models:
32
 
33
  ```py
34
- from champ import CHAMPPipeline
35
- pipeline = CHAMPPipeline.from_pretrained(
36
- "benjamin-paine/champ",
37
  torch_dtype=torch.float16,
38
  variant="fp16",
39
  device="cuda"
@@ -43,30 +39,242 @@ pipeline = CHAMPPipeline.from_pretrained(
43
  Or, as a single file:
44
 
45
  ```py
46
- from champ import CHAMPPipeline
47
- pipeline = CHAMPPipeline.from_single_file(
48
- "benjamin-paine/champ",
49
  torch_dtype=torch.float16,
50
  variant="fp16",
51
  device="cuda"
52
  ).to("cuda", dtype=torch.float16)
53
  ```
54
 
55
- Follow this format for execution:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ```py
58
- result = pipeline(
59
- reference: PIL.Image.Image,
60
- guidance: Dict[str, List[PIL.Image.Image]],
61
- width: int,
62
- height: int,
63
- video_length: int,
64
- num_inference_steps: int,
65
- guidance_scale: float
66
- ).videos
67
- # Result is a list of PIL Images
68
  ```
69
 
70
- Starting values for `num_inference_steps` and `guidance_scale` are `20` and `3.5`, respectively.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
- Guidance keys include `depth`, `normal`, `dwpose` and `semantic_map` (densepose.) This guide does not provide details on how to obtain those samples, but examples are available in [the git repository.](https://github.com/painebenjamin/champ/tree/master/example)
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ This repository contains a pruned and partially reorganized version of [AniPortrait](https://fudan-generative-vision.github.io/champ/#/).
5
 
6
  ```
7
+ @misc{wei2024aniportrait,
8
+ title={AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations},
9
+ author={Huawei Wei and Zejun Yang and Zhisheng Wang},
10
  year={2024},
11
+ eprint={2403.17694},
12
  archivePrefix={arXiv},
13
  primaryClass={cs.CV}
14
  }
15
  ```
16
 
 
 
 
 
 
 
17
  # Usage
18
 
19
+ ## Installation
20
+
21
+ First, install the AniPortrait package into your python environment. If you're creating a new environment for AniPortrait, be sure you also specify the version of torch you want with CUDA support, or else this will try to run only on CPU.
22
 
23
  ```sh
24
+ pip install git+https://github.com/painebenjamin/aniportrait.git
25
  ```
26
 
27
  Now, you can create the pipeline, automatically pulling the weights from this repository, either as individual models:
28
 
29
  ```py
30
+ from aniportrait import AniPortraitPipeline
31
+ pipeline = AniPortraitPipeline.from_pretrained(
32
+ "benjamin-paine/aniportrait",
33
  torch_dtype=torch.float16,
34
  variant="fp16",
35
  device="cuda"
 
39
  Or, as a single file:
40
 
41
  ```py
42
+ from aniportrait import AniPortraitPipeline
43
+ pipeline = AniPortraitPipeline.from_single_file(
44
+ "benjamin-paine/aniportrait",
45
  torch_dtype=torch.float16,
46
  variant="fp16",
47
  device="cuda"
48
  ).to("cuda", dtype=torch.float16)
49
  ```
50
 
51
+ The `AniPortraitPipeline` is a mega pipeline, capable of instantiating and executing other pipelines. It provides the following functions:
52
+
53
+ ## Workflows
54
+
55
+ ### img2img
56
+
57
+ ```py
58
+ pipeline.img2img(
59
+ reference_image: PIL.Image.Image,
60
+ pose_reference_image: PIL.Image.Image,
61
+ num_inference_steps: int,
62
+ guidance_scale: float,
63
+ eta: float=0.0,
64
+ reference_pose_image: Optional[Image.Image]=None,
65
+ generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
66
+ output_type: Optional[str]="pil",
67
+ return_dict: bool=True,
68
+ callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
69
+ callback_steps: Optional[int]=None,
70
+ width: Optional[int]=None,
71
+ height: Optional[int]=None,
72
+ **kwargs: Any
73
+ ) -> Pose2VideoPipelineOutput
74
+ ```
75
+
76
+ Using a reference image (for structure) and a pose reference image (for pose), render an image of the former in the pose of the latter.
77
+ - The pose reference image here is an unprocessed image, from which the face pose will be extracted.
78
+ - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
79
+
80
+ ### vid2vid
81
+
82
+ ```py
83
+ pipeline.vid2vid(
84
+ reference_image: PIL.Image.Image,
85
+ pose_reference_images: List[PIL.Image.Image],
86
+ num_inference_steps: int,
87
+ guidance_scale: float,
88
+ eta: float=0.0,
89
+ reference_pose_image: Optional[Image.Image]=None,
90
+ generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
91
+ output_type: Optional[str]="pil",
92
+ return_dict: bool=True,
93
+ callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
94
+ callback_steps: Optional[int]=None,
95
+ width: Optional[int]=None,
96
+ height: Optional[int]=None,
97
+ video_length: Optional[int]=None,
98
+ context_schedule: str="uniform",
99
+ context_frames: int=16,
100
+ context_overlap: int=4,
101
+ context_batch_size: int=1,
102
+ interpolation_factor: int=1,
103
+ use_long_video: bool=True,
104
+ **kwargs: Any
105
+ ) -> Pose2VideoPipelineOutput
106
+ ```
107
+
108
+ Using a reference image (for structure) and a sequence of pose reference images (for pose), render a video of the former in the poses of the latter, using context windowing for long-video generation when the poses are longer than 16 frames.
109
+ - Optionally pass `use_long_video = false` to disable using the long video pipeline.
110
+ - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
111
+ - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose reference images.
112
+
113
+ ### audio2vid
114
+
115
+ ```py
116
+ pipeline.audio2vid(
117
+ audio: str,
118
+ reference_image: PIL.Image.Image,
119
+ num_inference_steps: int,
120
+ guidance_scale: float,
121
+ fps: int=30,
122
+ eta: float=0.0,
123
+ reference_pose_image: Optional[Image.Image]=None,
124
+ pose_reference_images: Optional[List[PIL.Image.Image]]=None,
125
+ generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
126
+ output_type: Optional[str]="pil",
127
+ return_dict: bool=True,
128
+ callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
129
+ callback_steps: Optional[int]=None,
130
+ width: Optional[int]=None,
131
+ height: Optional[int]=None,
132
+ video_length: Optional[int]=None,
133
+ context_schedule: str="uniform",
134
+ context_frames: int=16,
135
+ context_overlap: int=4,
136
+ context_batch_size: int=1,
137
+ interpolation_factor: int=1,
138
+ use_long_video: bool=True,
139
+ **kwargs: Any
140
+ ) -> Pose2VideoPipelineOutput
141
+ ```
142
+
143
+ Using an audio file, draw `fps` face pose images per second for the duration of the audio. Then, using those face pose images, render a video.
144
+ - Optionally include a list of images to extract the poses from prior to merging with audio-generated poses (in essence, pass a video here to control non-speech motion). The default is a moderately active loop of head movement.
145
+ - Optionally pass width/height to modify the size. Defaults to reference image size.
146
+ - Optionally pass `use_long_video = false` to disable using the long video pipeline.
147
+ - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
148
+ - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose reference images.
149
+
150
+ ## Internals/Helpers
151
+
152
+ ### img2pose
153
+
154
+ ```py
155
+ pipeline.img2pose(
156
+ reference_image: PIL.Image.Image,
157
+ width: Optional[int]=None,
158
+ height: Optional[int]=None
159
+ ) -> PIL.Image.Image
160
+ ```
161
+
162
+ Detects face landmarks in an image and draws a face pose image.
163
+ - Optionally modify the original width and height.
164
+
165
+ ### vid2pose
166
+
167
+ ```py
168
+ pipeline.vid2pose(
169
+ reference_image: PIL.Image.Image,
170
+ retarget_image: Optional[PIL.Image.Image],
171
+ width: Optional[int]=None,
172
+ height: Optional[int]=None
173
+ ) -> List[PIL.Image.Image]
174
+ ```
175
+
176
+ Detects face landmarks in a series of images and draws pose images.
177
+ - Optionally modify the original width and height.
178
+ - Optionally retarget to a different face position, useful for video-to-video tasks.
179
+
180
+ ### audio2pose
181
 
182
  ```py
183
+ pipeline.audio2pose(
184
+ audio_path: str,
185
+ fps: int=30,
186
+ reference_image: Optional[PIL.Image.Image]=None,
187
+ pose_reference_images: Optional[List[PIL.Image.Image]]=None,
188
+ width: Optional[int]=None,
189
+ height: Optional[int]=None
190
+ ) -> List[PIL.Image.Image]
 
 
191
  ```
192
 
193
+ Using an audio file, draw `fps` face pose images per second for the duration of the audio.
194
+ - Optionally include a reference image to extract the face shape and initial position from. Default has a generic androgynous face shape.
195
+ - Optionally include a list of images to extract the poses from prior to merging with audio-generated poses (in essence, pass a video here to control non-speech motion). The default is a moderately active loop of head movement.
196
+ - Optionally pass width/height to modify the size. Defaults to reference image size, then pose image sizes, then 256.
197
+
198
+ ### pose2img
199
+
200
+ ```py
201
+ pipeline.pose2img(
202
+ reference_image: PIL.Image.Image,
203
+ pose_image: PIL.Image.Image,
204
+ num_inference_steps: int,
205
+ guidance_scale: float,
206
+ eta: float=0.0,
207
+ reference_pose_image: Optional[Image.Image]=None,
208
+ generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
209
+ output_type: Optional[str]="pil",
210
+ return_dict: bool=True,
211
+ callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
212
+ callback_steps: Optional[int]=None,
213
+ width: Optional[int]=None,
214
+ height: Optional[int]=None,
215
+ **kwargs: Any
216
+ ) -> Pose2VideoPipelineOutput
217
+ ```
218
+
219
+ Using a reference image (for structure) and a pose image (for pose), render an image of the former in the pose of the latter.
220
+ - The pose image here is a processed face pose. To pass a non-processed face pose, see `img2img`.
221
+ - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
222
+
223
+ ### pose2vid
224
+
225
+ ```py
226
+ pipeline.pose2vid(
227
+ reference_image: PIL.Image.Image,
228
+ pose_images: List[PIL.Image.Image],
229
+ num_inference_steps: int,
230
+ guidance_scale: float,
231
+ eta: float=0.0,
232
+ reference_pose_image: Optional[Image.Image]=None,
233
+ generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
234
+ output_type: Optional[str]="pil",
235
+ return_dict: bool=True,
236
+ callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
237
+ callback_steps: Optional[int]=None,
238
+ width: Optional[int]=None,
239
+ height: Optional[int]=None,
240
+ video_length: Optional[int]=None,
241
+ **kwargs: Any
242
+ ) -> Pose2VideoPipelineOutput
243
+ ```
244
+
245
+ Using a reference image (for structure) and pose images (for pose), render a video of the former in the poses of the latter.
246
+ - The pose images here are a processed face poses. To non-processed face poses, see `vid2vid`.
247
+ - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
248
+ - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose images.
249
+
250
+ ### pose2vid_long
251
+
252
+ ```py
253
+ pipeline.pose2vid_long(
254
+ reference_image: PIL.Image.Image,
255
+ pose_images: List[PIL.Image.Image],
256
+ num_inference_steps: int,
257
+ guidance_scale: float,
258
+ eta: float=0.0,
259
+ reference_pose_image: Optional[Image.Image]=None,
260
+ generation: Optional[Union[torch.Generator, List[torch.Generator]]]=None,
261
+ output_type: Optional[str]="pil",
262
+ return_dict: bool=True,
263
+ callback: Optional[Callable[[int, int, torch.FloatTensor], None]]=None,
264
+ callback_steps: Optional[int]=None,
265
+ width: Optional[int]=None,
266
+ height: Optional[int]=None,
267
+ video_length: Optional[int]=None,
268
+ context_schedule: str="uniform",
269
+ context_frames: int=16,
270
+ context_overlap: int=4,
271
+ context_batch_size: int=1,
272
+ interpolation_factor: int=1,
273
+ **kwargs: Any
274
+ ) -> Pose2VideoPipelineOutput
275
+ ```
276
 
277
+ Using a reference image (for structure) and pose images (for pose), render a video of the former in the poses of the latter, using context windowing for long-video generation.
278
+ - The pose images here are a processed face poses. To non-processed face poses, see `vid2vid`.
279
+ - Optionally pass `reference_pose_image` to designate the pose of `reference_image`. When not passed, the pose of `reference_image` is automatically detected.
280
+ - Optionally pass `video_length` to use this many frames. Default is the same as the length of the pose images.