File size: 23,908 Bytes
ce8359f
 
 
 
 
 
 
 
202ca65
f2fe75e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202ca65
f2fe75e
 
 
 
 
 
 
 
 
 
 
5f5a660
74bf5b5
 
 
 
 
2110c13
 
 
 
 
 
19a0317
2110c13
 
7ad6601
2110c13
 
 
 
 
19a0317
2110c13
 
7ad6601
202ca65
002ba26
dc8ae64
002ba26
202ca65
 
c338390
ce8359f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b51da9
f7d9153
ce8359f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
586fbd1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b51da9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
586fbd1
 
 
 
 
cc1785d
9b51da9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
586fbd1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b51da9
 
 
586fbd1
 
 
 
 
 
9b51da9
586fbd1
 
9b51da9
586fbd1
 
 
 
bb8d536
25d14a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dc8ae64
2110c13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7ad6601
2110c13
 
 
 
 
7ad6601
2110c13
 
 
 
dc8ae64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
# svjack/GenshinImpact_XL_Base

This model is derived from [CivitAI](https://civitai.com/models/386505).

## Acknowledgments

Special thanks to [mobeimunan](https://civitai.com/user/mobeimunan) for their contributions to the development of this model.

<!--
from moviepy.editor import ImageClip, VideoFileClip, concatenate_videoclips, clips_array

# 定义图像和视频的路径
image1_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/3IkZz7uXW9kc-lTnKdQN8.png"
image2_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ckrKqytF5MhanjIc_Vn1q.png"
image3_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/vfffGerUQV9W1MHxc_rN_.png"
video_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/gBaodBk8z3aI69LiT36w2.mp4"

duration = 10

# 加载图像和视频
image1 = ImageClip(image1_path).set_duration(duration).resize(height = 512, width = 512)  # 设置图像的持续时间为10秒
image2 = ImageClip(image2_path).set_duration(duration).resize(height = 512, width = 1024)
image3 = ImageClip(image3_path).set_duration(duration).resize(height = 512, width = 1024)
video = VideoFileClip(video_path).resize(height = 512, width = 512)

# 调整视频的持续时间以匹配图像
video = video.subclip(0, duration)  # 截取视频的前10秒

# 将图像和视频拼接成一个2x2的网格
final_clip = clips_array([[image1, image2], [video, image3]])

# 调整最终视频的宽高比
final_clip = final_clip.resize(width=1080)  # 调整宽度为1080,高度会自动调整

# 导出最终视频
final_clip.write_videofile("zhongli_merge_im_output.mp4", codec="libx264")

VideoFileClip("zhongli_merge_im_output.mp4").set_duration(2.2).write_videofile("zhongli_merge_output_2_2.mp4", codec="libx264")
-->

<div style="display: flex; flex-direction: column; align-items: center;">
    <div style="margin-bottom: 10px;">
        <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/bsgBrDOOXYN-oBH95q5uK.mp4" style="width: 1024px; height: 768x;"></video>
    </div>
    <div style="margin-bottom: 10px;">
        <img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/WypE04ag7_4Z1FKhzk475.png" width="1024" height="1024">
    </div>
    <div style="margin-bottom: 10px;">
        <h3>Zhongli Drinking Tea:</h3>
    </div>
    <div style="margin-bottom: 10px;">
        <img src="https://github.com/user-attachments/assets/00451728-f2d5-4009-afa8-23baaabdc223" style="width: 1024px; height: 256px;">
    </div>
    <div style="margin-bottom: 10px;">
        <video controls autoplay src="https://github.com/user-attachments/assets/607e7eb7-d41c-4740-9c8a-8369c31487da" style="width: 1024px; height: 800px;"></video>
    </div>
    <div style="margin-bottom: 10px;">
        <h3>Kamisato Ayato Smiling:</h3>
    </div>
    <div style="margin-bottom: 10px;">
        <img src="https://github.com/user-attachments/assets/7a920f4c-8a3a-4387-98d6-381a798566ef" style="width: 1024px; height: 256px;">
    </div>
    <div style="margin-bottom: 10px;">
        <video controls autoplay src="https://github.com/user-attachments/assets/aaa9849e-0c53-4012-b6c3-9ceb9910f2f8" style="width: 1024px; height: 800px;"></video>
    </div>
    <div style="margin-bottom: 10px;">
        <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/Kmh4NJ1AkfV5X3-kkilAK.mp4" style="width: 1024px; height: 1024px;"></video>
    </div>
</div>


## Supported Characters

The model currently supports the following 73 characters from Genshin Impact:

```python
name_dict = {
    '旅行者女': 'lumine',
    '旅行者男': 'aether',
    '派蒙': 'PAIMON',
    '迪奥娜': 'DIONA',
    '菲米尼': 'FREMINET',
    '甘雨': 'GANYU',
    '凯亚': 'KAEYA',
    '莱依拉': 'LAYLA',
    '罗莎莉亚': 'ROSARIA',
    '七七': 'QIQI',
    '申鹤': 'SHENHE',
    '神里绫华': 'KAMISATO AYAKA',
    '优菈': 'EULA',
    '重云': 'CHONGYUN',
    '夏洛蒂': 'charlotte',
    '莱欧斯利': 'WRIOTHESLEY',
    '艾尔海森': 'ALHAITHAM',
    '柯莱': 'COLLEI',
    '纳西妲': 'NAHIDA',
    '绮良良': 'KIRARA',
    '提纳里': 'TIGHNARI',
    '瑶瑶': 'YAOYAO',
    '珐露珊': 'FARUZAN',
    '枫原万叶': 'KAEDEHARA KAZUHA',
    '琳妮特': 'LYNETTE',
    '流浪者 散兵': 'scaramouche',
    '鹿野院平藏': 'SHIKANOIN HEIZOU',
    '琴': 'JEAN',
    '砂糖': 'SUCROSE',
    '温迪': 'VENTI',
    '魈': 'XIAO',
    '早柚': 'SAYU',
    '安柏': 'AMBER',
    '班尼特': 'BENNETT',
    '迪卢克': 'DILUC',
    '迪西娅': 'DEHYA',
    '胡桃': 'HU TAO',
    '可莉': 'KLEE',
    '林尼': 'LYNEY',
    '托马': 'THOMA',
    '香菱': 'XIANG LING',
    '宵宫': 'YOIMIYA',
    '辛焱': 'XINYAN',
    '烟绯': 'YANFEI',
    '八重神子': 'YAE MIKO',
    '北斗': 'BEIDOU',
    '菲谢尔': 'FISCHL',
    '九条裟罗': 'KUJO SARA',
    '久岐忍': 'KUKI SHINOBU',
    '刻晴': 'KEQING',
    '雷电将军': 'RAIDEN SHOGUN',
    '雷泽': 'RAZOR',
    '丽莎': 'LISA',
    '赛诺': 'CYNO',
    '芙宁娜': 'FURINA',
    '芭芭拉': 'BARBARA',
    '公子 达达利亚': 'TARTAGLIA',
    '坎蒂丝': 'CANDACE',
    '莫娜': 'MONA',
    '妮露': 'NILOU',
    '珊瑚宫心海': 'SANGONOMIYA KOKOMI',
    '神里绫人': 'KAMISATO AYATO',
    '行秋': 'XINGQIU',
    '夜兰': 'YELAN',
    '那维莱特': 'NEUVILLETTE',
    '娜维娅': 'NAVIA',
    '阿贝多': 'ALBEDO',
    '荒泷一斗': 'ARATAKI ITTO',
    '凝光': 'NING GUANG',
    '诺艾尔': 'NOELLE',
    '五郎': 'GOROU',
    '云堇': 'YUN JIN',
    '钟离': 'ZHONGLI'
}
```

## Installation

To use this model, you need to install the following dependencies:

```bash
sudo apt-get update && sudo apt-get install git-lfs ffmpeg cbm
pip install -U diffusers transformers sentencepiece peft controlnet-aux moviepy
```

## Example Usage

### Generating an Image of Zhongli

Here's an example of how to generate an image of Zhongli using this model:

```python
from diffusers import StableDiffusionXLPipeline
import torch

pipeline = StableDiffusionXLPipeline.from_pretrained(
    "svjack/GenshinImpact_XL_Base",
    torch_dtype=torch.float16
).to("cuda")

prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres,"
negative_prompt = "nsfw,lowres,(bad),text,error,fewer,extra,missing,worst quality,jpeg artifacts,low quality,watermark,unfinished,displeasing,oldest,early,chromatic aberration,signature,extra digits,artistic error,username,scan,[abstract],"
image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=torch.manual_seed(0),
).images[0]
image
image.save("zhongli_1024x1024.png")
```

<div style="display: flex; flex-direction: column; align-items: center;">
    <div style="margin-bottom: 10px;">
        <img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/3IkZz7uXW9kc-lTnKdQN8.png" width="768" height="768">
        <p style="text-align: center;">钟离</p>
    </div>
</div>

### Using Canny ControlNet to Restore 2D Images from 3D Toy Photos

Here's an example of how to use Canny ControlNet to restore 2D images from 3D toy photos:

#### Genshin Impact 3D Toys

<div style="display: flex; flex-direction: column; align-items: center;">
    <div style="margin-bottom: 10px;">
        <img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/YNG9vRqZGvUSxb_UUrLE5.jpeg" width="512" height="768">
        <p style="text-align: center;">钟离</p>
    </div>
    <div style="margin-bottom: 10px;">
        <img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/1JfhfFi9qogHwB4M2S54m.jpeg" width="512" height="768">
        <p style="text-align: center;">派蒙</p>
    </div>
</div>

```python
from diffusers import AutoPipelineForText2Image, ControlNetModel
from diffusers.utils import load_image
import torch
from PIL import Image
from controlnet_aux import CannyDetector

controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)

pipeline = AutoPipelineForText2Image.from_pretrained(
    "svjack/GenshinImpact_XL_Base",
    controlnet=controlnet,
    torch_dtype=torch.float16
).to("cuda")
#pipeline.enable_model_cpu_offload()

canny = CannyDetector()
canny(Image.open("zhongli-cb.jpg")).save("zhongli-cb-canny.jpg")
canny_image = load_image(
    "zhongli-cb-canny.jpg"
)

controlnet_conditioning_scale = 0.5
generator = torch.Generator(device="cpu").manual_seed(1)
images = pipeline(
        prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,highres",
        controlnet_conditioning_scale=controlnet_conditioning_scale,
        image=canny_image,
        num_inference_steps=50,
        guidance_scale=7.0,
        generator=generator,
    ).images
images[0]
images[0].save("zhongli_trans.png")

canny = CannyDetector()
canny(Image.open("paimon-cb-crop.jpg")).save("paimon-cb-canny.jpg")
canny_image = load_image(
    "paimon-cb-canny.jpg"
)

controlnet_conditioning_scale = 0.7
generator = torch.Generator(device="cpu").manual_seed(3)
images = pipeline(
        prompt="solo,PAIMON\(genshin impact\),1girl,portrait,highres, bright, shiny, high detail, anime",
        controlnet_conditioning_scale=controlnet_conditioning_scale,
        image=canny_image,
        num_inference_steps=50,
        guidance_scale=8.0,
        generator=generator,
    ).images
images[0]
images[0].save("paimon_trans.png")
```

### Creating a Grid Image

You can also create a grid image from a list of PIL Image objects:

```python
from PIL import Image

def create_grid_image(image_list, rows, cols, cell_width, cell_height):
    """
    Create a grid image from a list of PIL Image objects.

    :param image_list: A list of PIL Image objects
    :param rows: Number of rows in the grid
    :param cols: Number of columns in the grid
    :param cell_width: Width of each cell in the grid
    :param cell_height: Height of each cell in the grid
    :return: The resulting grid image
    """
    total_width = cols * cell_width
    total_height = rows * cell_height

    grid_image = Image.new('RGB', (total_width, total_height))

    for i, img in enumerate(image_list):
        row = i // cols
        col = i % cols

        img = img.resize((cell_width, cell_height))

        x_offset = col * cell_width
        y_offset = row * cell_height

        grid_image.paste(img, (x_offset, y_offset))

    return grid_image

create_grid_image([Image.open("zhongli-cb.jpg") ,Image.open("zhongli-cb-canny.jpg"), Image.open("zhongli_trans.png")], 1, 3, 512, 768)

create_grid_image([Image.open("paimon-cb-crop.jpg") ,Image.open("paimon-cb-canny.jpg"), Image.open("paimon_trans.png")], 1, 3, 512, 768)
```

This will create a grid image showing the original, Canny edge detection, and transformed images side by side.

<div>
    <b><h3 style="text-align: center;">Below image list in : (Genshin Impact Toy/ Canny Image / Gemshin Impact Restore 2D Image)</h3></b>
    <div style="display: flex; flex-direction: column; align-items: center;">
        <div style="margin-bottom: 10px;">
            <img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ckrKqytF5MhanjIc_Vn1q.png" width="1536" height="768">
            <p style="text-align: center;">钟离</p>
        </div>
        <div style="margin-bottom: 10px;">
            <img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/vfffGerUQV9W1MHxc_rN_.png" width="1536" height="768">
            <p style="text-align: center;">派蒙</p>
        </div>
    </div>
</div>

### Generating an Animation of Zhongli
Here's an example of how to generate an animation of Zhongli using the `AnimateDiffSDXLPipeline`:

```python
import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained(
    "a-r-r-o-w/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16
)

model_id = "svjack/GenshinImpact_XL_Base"
scheduler = DDIMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler",
    clip_sample=False,
    timestep_spacing="linspace",
    beta_schedule="linear",
    steps_offset=1,
)

pipe = AnimateDiffSDXLPipeline.from_pretrained(
    model_id,
    motion_adapter=adapter,
    scheduler=scheduler,
    torch_dtype=torch.float16,
).to("cuda")

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

output = pipe(
    prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward.",
    negative_prompt="low quality, worst quality",
    num_inference_steps=20,
    guidance_scale=8,
    width=1024,
    height=1024,
    num_frames=16,
    generator=torch.manual_seed(4),
)
frames = output.frames[0]
export_to_gif(frames, "zhongli_animation.gif")

from diffusers.utils import export_to_video
export_to_video(frames, "zhongli_animation.mp4")
from IPython import display
display.Video("zhongli_animation.mp4", width=512, height=512)
```

Use `AutoPipelineForImage2Image` to enhance output:

```python
from moviepy.editor import VideoFileClip
from PIL import Image
clip = VideoFileClip("zhongli_animation.mp4")
frames = list(map(Image.fromarray ,clip.iter_frames()))

from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid
import torch

pipeline_text2image = AutoPipelineForText2Image.from_pretrained(
    "svjack/GenshinImpact_XL_Base",
    torch_dtype=torch.float16
)

# use from_pipe to avoid consuming additional memory when loading a checkpoint
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda")

from tqdm import tqdm
req = []
for init_image in tqdm(frames):
    prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward."
    image = pipeline(prompt, image=init_image, strength=0.8, guidance_scale=10.5).images[0]
    req.append(image)

from diffusers.utils import export_to_video
export_to_video(req, "zhongli_animation_im2im.mp4")
from IPython import display
display.Video("zhongli_animation_im2im.mp4", width=512, height=512)
```

##### Enhancing Animation with RIFE
To enhance the animation using RIFE (Real-Time Intermediate Flow Estimation):

```bash
git clone https://github.com/svjack/Practical-RIFE && cd Practical-RIFE && pip install -r requirements.txt
python inference_video.py --multi=128 --video=../zhongli_animation_im2im.mp4
```

```python
from moviepy.editor import VideoFileClip
clip = VideoFileClip("zhongli_animation_im2im_128X_1280fps.mp4")

def speed_change_video(video_clip, speed_factor, output_path):
    if speed_factor == 1:
        # 如果变速因子为1,直接复制原视频
        video_clip.write_videofile(output_path, codec="libx264")
    else:
        # 否则,按变速因子调整视频速度
        new_duration = video_clip.duration / speed_factor
        sped_up_clip = video_clip.speedx(speed_factor)
        sped_up_clip.write_videofile(output_path, codec="libx264")

speed_change_video(clip, 0.05, "zhongli_animation_im2im_128X_1280fps_wrt.mp4")

VideoFileClip("zhongli_animation_im2im_128X_1280fps_wrt.mp4").set_duration(10).write_videofile("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", codec="libx264")
from IPython import display
display.Video("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", width=512, height=512)
```

##### Merging Videos Horizontally
You can merge two videos horizontally using the following function:

```python
from moviepy.editor import VideoFileClip, CompositeVideoClip

def merge_videos_horizontally(video_path1, video_path2, output_video_path):
    clip1 = VideoFileClip(video_path1)
    clip2 = VideoFileClip(video_path2)

    max_duration = max(clip1.duration, clip2.duration)

    if clip1.duration < max_duration:
        clip1 = clip1.loop(duration=max_duration)
    if clip2.duration < max_duration:
        clip2 = clip2.loop(duration=max_duration)

    total_width = clip1.w + clip2.w
    total_height = max(clip1.h, clip2.h)

    final_clip = CompositeVideoClip([
        clip1.set_position(("left", "center")),
        clip2.set_position(("right", "center"))
    ], size=(total_width, total_height))

    final_clip.write_videofile(output_video_path, codec='libx264')

    print(f"Merged video saved to {output_video_path}")

# Example usage
video_path1 = "zhongli_animation.mp4"  # 第一个视频文件路径
video_path2 = "zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4"  # 第二个视频文件路径
output_video_path = "zhongli_inter_video_im2im_compare.mp4"  # 输出视频的路径
merge_videos_horizontally(video_path1, video_path2, output_video_path)
```



<div>
    <b><h3 style="text-align: center;">Left is zhongli_animation.mp4 (By AnimateDiffSDXLPipeline), Right is zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4 (By AutoPipelineForImage2Image + Practical-RIFE)</h3></b>
    <div style="display: flex; flex-direction: column; align-items: center;">
        <div style="margin-bottom: 10px;">
            <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/gBaodBk8z3aI69LiT36w2.mp4"></video>
            <p style="text-align: center;">钟离</p>
        </div>
    </div>
</div>

# Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs - ICML 2024

This repository contains the implementation of a cutting-edge text-to-image diffusion model that leverages multimodal large language models (LLMs) for advanced image generation. The project focuses on recaptioning, planning, and generating high-quality images from textual descriptions, showcasing the capabilities of modern AI in creative content production.

## Installation

To get started with the project, follow these steps to set up the environment and install the necessary dependencies:

1. **Clone the Repository:**
   ```bash
   git clone https://github.com/svjack/RPG-DiffusionMaster
   cd RPG-DiffusionMaster
   ```

2. **Create and Activate Conda Environment:**
   ```bash
   conda create -n RPG python==3.9
   conda activate RPG
   ```

3. **Install Jupyter Kernel:**
   ```bash
   pip install ipykernel
   python -m ipykernel install --user --name RPG --display-name "RPG"
   ```

4. **Install Required Packages:**
   ```bash
   pip install -r requirements.txt
   ```

5. **Clone Diffusers Repository:**
   ```bash
   git clone https://github.com/huggingface/diffusers
   ```

## Demo

This section provides a quick demonstration of how to use the `RegionalDiffusionXLPipeline` to generate images based on textual prompts. The example below demonstrates the process of generating an image using a multimodal LLM to split and refine the prompt.

### Import Required Modules

```python
from RegionalDiffusion_base import RegionalDiffusionPipeline
from RegionalDiffusion_xl import RegionalDiffusionXLPipeline
from diffusers.schedulers import KarrasDiffusionSchedulers, DPMSolverMultistepScheduler
from mllm import local_llm, GPT4, DeepSeek
import torch
```

### Load the Model and Configure Scheduler

```python
pipe = RegionalDiffusionXLPipeline.from_single_file(
    "https://huggingface.co/svjack/GenshinImpact_XL_Base/blob/main/sdxlBase_v10.safetensors",
    torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
pipe.enable_xformers_memory_efficient_attention()
```

### User Input and MLLM Processing

```python
# User input prompt
prompt = 'ZHONGLI(genshin impact) with NING GUANG(genshin impact) in red cheongsam in the bar.'

# Process the prompt using DeepSeek MLLM
para_dict = DeepSeek(prompt)

# Extract parameters for image generation
split_ratio = para_dict['Final split ratio']
regional_prompt = para_dict['Regional Prompt']
negative_prompt = ""  # Optional negative prompt
```

### Generate and Save the Image

```python
images = pipe(
    prompt=regional_prompt,
    split_ratio=split_ratio,  # The ratio of the regional prompt
    batch_size=1,  # Batch size
    base_ratio=0.5,  # The ratio of the base prompt
    base_prompt=prompt,
    num_inference_steps=20,  # Sampling steps
    height=1024,
    negative_prompt=negative_prompt,  # Negative prompt
    width=1024,
    seed=0,  # Random seed
    guidance_scale=7.0
).images[0]

# Save the generated image
images.save("test_zhong_ning.png")
```

This demo showcases the power of combining text-to-image diffusion with multimodal LLMs to generate high-quality images from complex textual descriptions. The generated image is saved as `test_zhong_ning.png`.

---

Feel free to explore the repository and experiment with different prompts and configurations to see the full potential of this advanced text-to-image generation model.


![image/png](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ZJZkSaMOGRI7QM0uegeqS.png)

## MotionCtrl and MasaCtrl: Genshin Impact Character Synthesis
Check https://github.com/svjack/MasaCtrl to view example about Genshin Impact Character Synthesis video by MasaCtrl 

- **Zhongli Drinking Tea:**
  ```
  "solo,ZHONGLI(genshin impact),1boy,highres," -> "solo,ZHONGLI drink tea use chinese cup (genshin impact),1boy,highres,"
  ```
![Screenshot 2024-11-17 132742](https://github.com/user-attachments/assets/00451728-f2d5-4009-afa8-23baaabdc223)

- **Kamisato Ayato Smiling:**
  ```
  "solo,KAMISATO AYATO(genshin impact),1boy,highres," -> "solo,KAMISATO AYATO smiling (genshin impact),1boy,highres,"
  ```

![Screenshot 2024-11-17 133421](https://github.com/user-attachments/assets/7a920f4c-8a3a-4387-98d6-381a798566ef)

<div style="display: flex; flex-direction: column; align-items: center;">
    <div style="margin-bottom: 10px;">
        <h3>Zhongli Drinking Tea:</h3>
    </div>
    <div style="margin-bottom: 10px;">
        <video controls autoplay src="https://github.com/user-attachments/assets/607e7eb7-d41c-4740-9c8a-8369c31487da" style="width: 1024px; height: 800px;"></video>
    </div>
    <div style="margin-bottom: 10px;">
        <h3>Kamisato Ayato Smiling:</h3>
    </div>
    <div style="margin-bottom: 10px;">
        <video controls autoplay src="https://github.com/user-attachments/assets/aaa9849e-0c53-4012-b6c3-9ceb9910f2f8" style="width: 1024px; height: 800px;"></video>
    </div>
</div>


## Perturbed-Attention-Guidance with Genshin Impact XL
Here's an example of how to enhance Genshin Impact XL by [https://github.com/svjack/Perturbed-Attention-Guidance](https://github.com/svjack/Perturbed-Attention-Guidance):

### Clone the Repository

Next, clone the repository from Hugging Face:

```bash
git clone https://huggingface.co/spaces/svjack/perturbed-attention-guidance-genshin_impact_xl
```

### Navigate to the Repository Directory

Change into the cloned repository directory:

```bash
cd perturbed-attention-guidance-genshin_impact_xl
```

### Install Python Requirements

Install the required Python packages using `pip`:

```bash
pip install -r requirements.txt
```

### Run the Application

Finally, run the application:

```bash
python app.py
```

<div>
    <b><h3 style="text-align: center;">Left Use BreadcrumbsPerturbed-Attention-Guidance
, Right Original Genshin Impact XL</h3></b>
    <b><h4 style="text-align: center;">Left Seems more pretty </h4></b>
    <div style="display: flex; flex-direction: column; align-items: center;">
        <div style="margin-bottom: 10px;">
            <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/Kmh4NJ1AkfV5X3-kkilAK.mp4"></video>
        </div>
    </div>
</div>