|
# svjack/GenshinImpact_XL_Base |
|
|
|
This model is derived from [CivitAI](https://civitai.com/models/386505). |
|
|
|
## Acknowledgments |
|
|
|
Special thanks to [mobeimunan](https://civitai.com/user/mobeimunan) for their contributions to the development of this model. |
|
|
|
<!-- |
|
from moviepy.editor import ImageClip, VideoFileClip, concatenate_videoclips, clips_array |
|
|
|
# 定义图像和视频的路径 |
|
image1_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/3IkZz7uXW9kc-lTnKdQN8.png" |
|
image2_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ckrKqytF5MhanjIc_Vn1q.png" |
|
image3_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/vfffGerUQV9W1MHxc_rN_.png" |
|
video_path = "https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/gBaodBk8z3aI69LiT36w2.mp4" |
|
|
|
duration = 10 |
|
|
|
# 加载图像和视频 |
|
image1 = ImageClip(image1_path).set_duration(duration).resize(height = 512, width = 512) # 设置图像的持续时间为10秒 |
|
image2 = ImageClip(image2_path).set_duration(duration).resize(height = 512, width = 1024) |
|
image3 = ImageClip(image3_path).set_duration(duration).resize(height = 512, width = 1024) |
|
video = VideoFileClip(video_path).resize(height = 512, width = 512) |
|
|
|
# 调整视频的持续时间以匹配图像 |
|
video = video.subclip(0, duration) # 截取视频的前10秒 |
|
|
|
# 将图像和视频拼接成一个2x2的网格 |
|
final_clip = clips_array([[image1, image2], [video, image3]]) |
|
|
|
# 调整最终视频的宽高比 |
|
final_clip = final_clip.resize(width=1080) # 调整宽度为1080,高度会自动调整 |
|
|
|
# 导出最终视频 |
|
final_clip.write_videofile("zhongli_merge_im_output.mp4", codec="libx264") |
|
|
|
VideoFileClip("zhongli_merge_im_output.mp4").set_duration(2.2).write_videofile("zhongli_merge_output_2_2.mp4", codec="libx264") |
|
--> |
|
|
|
<div style="display: flex; flex-direction: column; align-items: center;"> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/bsgBrDOOXYN-oBH95q5uK.mp4" style="width: 1024px; height: 768x;"></video> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/WypE04ag7_4Z1FKhzk475.png" width="1024" height="1024"> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<h3>Zhongli Drinking Tea:</h3> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://github.com/user-attachments/assets/00451728-f2d5-4009-afa8-23baaabdc223" style="width: 1024px; height: 256px;"> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://github.com/user-attachments/assets/607e7eb7-d41c-4740-9c8a-8369c31487da" style="width: 1024px; height: 800px;"></video> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<h3>Kamisato Ayato Smiling:</h3> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://github.com/user-attachments/assets/7a920f4c-8a3a-4387-98d6-381a798566ef" style="width: 1024px; height: 256px;"> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://github.com/user-attachments/assets/aaa9849e-0c53-4012-b6c3-9ceb9910f2f8" style="width: 1024px; height: 800px;"></video> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/Kmh4NJ1AkfV5X3-kkilAK.mp4" style="width: 1024px; height: 1024px;"></video> |
|
</div> |
|
</div> |
|
|
|
|
|
## Supported Characters |
|
|
|
The model currently supports the following 73 characters from Genshin Impact: |
|
|
|
```python |
|
name_dict = { |
|
'旅行者女': 'lumine', |
|
'旅行者男': 'aether', |
|
'派蒙': 'PAIMON', |
|
'迪奥娜': 'DIONA', |
|
'菲米尼': 'FREMINET', |
|
'甘雨': 'GANYU', |
|
'凯亚': 'KAEYA', |
|
'莱依拉': 'LAYLA', |
|
'罗莎莉亚': 'ROSARIA', |
|
'七七': 'QIQI', |
|
'申鹤': 'SHENHE', |
|
'神里绫华': 'KAMISATO AYAKA', |
|
'优菈': 'EULA', |
|
'重云': 'CHONGYUN', |
|
'夏洛蒂': 'charlotte', |
|
'莱欧斯利': 'WRIOTHESLEY', |
|
'艾尔海森': 'ALHAITHAM', |
|
'柯莱': 'COLLEI', |
|
'纳西妲': 'NAHIDA', |
|
'绮良良': 'KIRARA', |
|
'提纳里': 'TIGHNARI', |
|
'瑶瑶': 'YAOYAO', |
|
'珐露珊': 'FARUZAN', |
|
'枫原万叶': 'KAEDEHARA KAZUHA', |
|
'琳妮特': 'LYNETTE', |
|
'流浪者 散兵': 'scaramouche', |
|
'鹿野院平藏': 'SHIKANOIN HEIZOU', |
|
'琴': 'JEAN', |
|
'砂糖': 'SUCROSE', |
|
'温迪': 'VENTI', |
|
'魈': 'XIAO', |
|
'早柚': 'SAYU', |
|
'安柏': 'AMBER', |
|
'班尼特': 'BENNETT', |
|
'迪卢克': 'DILUC', |
|
'迪西娅': 'DEHYA', |
|
'胡桃': 'HU TAO', |
|
'可莉': 'KLEE', |
|
'林尼': 'LYNEY', |
|
'托马': 'THOMA', |
|
'香菱': 'XIANG LING', |
|
'宵宫': 'YOIMIYA', |
|
'辛焱': 'XINYAN', |
|
'烟绯': 'YANFEI', |
|
'八重神子': 'YAE MIKO', |
|
'北斗': 'BEIDOU', |
|
'菲谢尔': 'FISCHL', |
|
'九条裟罗': 'KUJO SARA', |
|
'久岐忍': 'KUKI SHINOBU', |
|
'刻晴': 'KEQING', |
|
'雷电将军': 'RAIDEN SHOGUN', |
|
'雷泽': 'RAZOR', |
|
'丽莎': 'LISA', |
|
'赛诺': 'CYNO', |
|
'芙宁娜': 'FURINA', |
|
'芭芭拉': 'BARBARA', |
|
'公子 达达利亚': 'TARTAGLIA', |
|
'坎蒂丝': 'CANDACE', |
|
'莫娜': 'MONA', |
|
'妮露': 'NILOU', |
|
'珊瑚宫心海': 'SANGONOMIYA KOKOMI', |
|
'神里绫人': 'KAMISATO AYATO', |
|
'行秋': 'XINGQIU', |
|
'夜兰': 'YELAN', |
|
'那维莱特': 'NEUVILLETTE', |
|
'娜维娅': 'NAVIA', |
|
'阿贝多': 'ALBEDO', |
|
'荒泷一斗': 'ARATAKI ITTO', |
|
'凝光': 'NING GUANG', |
|
'诺艾尔': 'NOELLE', |
|
'五郎': 'GOROU', |
|
'云堇': 'YUN JIN', |
|
'钟离': 'ZHONGLI' |
|
} |
|
``` |
|
|
|
## Installation |
|
|
|
To use this model, you need to install the following dependencies: |
|
|
|
```bash |
|
sudo apt-get update && sudo apt-get install git-lfs ffmpeg cbm |
|
pip install -U diffusers transformers sentencepiece peft controlnet-aux moviepy |
|
``` |
|
|
|
## Example Usage |
|
|
|
### Generating an Image of Zhongli |
|
|
|
Here's an example of how to generate an image of Zhongli using this model: |
|
|
|
```python |
|
from diffusers import StableDiffusionXLPipeline |
|
import torch |
|
|
|
pipeline = StableDiffusionXLPipeline.from_pretrained( |
|
"svjack/GenshinImpact_XL_Base", |
|
torch_dtype=torch.float16 |
|
).to("cuda") |
|
|
|
prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres," |
|
negative_prompt = "nsfw,lowres,(bad),text,error,fewer,extra,missing,worst quality,jpeg artifacts,low quality,watermark,unfinished,displeasing,oldest,early,chromatic aberration,signature,extra digits,artistic error,username,scan,[abstract]," |
|
image = pipeline( |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
generator=torch.manual_seed(0), |
|
).images[0] |
|
image |
|
image.save("zhongli_1024x1024.png") |
|
``` |
|
|
|
<div style="display: flex; flex-direction: column; align-items: center;"> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/3IkZz7uXW9kc-lTnKdQN8.png" width="768" height="768"> |
|
<p style="text-align: center;">钟离</p> |
|
</div> |
|
</div> |
|
|
|
### Using Canny ControlNet to Restore 2D Images from 3D Toy Photos |
|
|
|
Here's an example of how to use Canny ControlNet to restore 2D images from 3D toy photos: |
|
|
|
#### Genshin Impact 3D Toys |
|
|
|
<div style="display: flex; flex-direction: column; align-items: center;"> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/YNG9vRqZGvUSxb_UUrLE5.jpeg" width="512" height="768"> |
|
<p style="text-align: center;">钟离</p> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/1JfhfFi9qogHwB4M2S54m.jpeg" width="512" height="768"> |
|
<p style="text-align: center;">派蒙</p> |
|
</div> |
|
</div> |
|
|
|
```python |
|
from diffusers import AutoPipelineForText2Image, ControlNetModel |
|
from diffusers.utils import load_image |
|
import torch |
|
from PIL import Image |
|
from controlnet_aux import CannyDetector |
|
|
|
controlnet = ControlNetModel.from_pretrained( |
|
"diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16 |
|
) |
|
|
|
pipeline = AutoPipelineForText2Image.from_pretrained( |
|
"svjack/GenshinImpact_XL_Base", |
|
controlnet=controlnet, |
|
torch_dtype=torch.float16 |
|
).to("cuda") |
|
#pipeline.enable_model_cpu_offload() |
|
|
|
canny = CannyDetector() |
|
canny(Image.open("zhongli-cb.jpg")).save("zhongli-cb-canny.jpg") |
|
canny_image = load_image( |
|
"zhongli-cb-canny.jpg" |
|
) |
|
|
|
controlnet_conditioning_scale = 0.5 |
|
generator = torch.Generator(device="cpu").manual_seed(1) |
|
images = pipeline( |
|
prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,highres", |
|
controlnet_conditioning_scale=controlnet_conditioning_scale, |
|
image=canny_image, |
|
num_inference_steps=50, |
|
guidance_scale=7.0, |
|
generator=generator, |
|
).images |
|
images[0] |
|
images[0].save("zhongli_trans.png") |
|
|
|
canny = CannyDetector() |
|
canny(Image.open("paimon-cb-crop.jpg")).save("paimon-cb-canny.jpg") |
|
canny_image = load_image( |
|
"paimon-cb-canny.jpg" |
|
) |
|
|
|
controlnet_conditioning_scale = 0.7 |
|
generator = torch.Generator(device="cpu").manual_seed(3) |
|
images = pipeline( |
|
prompt="solo,PAIMON\(genshin impact\),1girl,portrait,highres, bright, shiny, high detail, anime", |
|
controlnet_conditioning_scale=controlnet_conditioning_scale, |
|
image=canny_image, |
|
num_inference_steps=50, |
|
guidance_scale=8.0, |
|
generator=generator, |
|
).images |
|
images[0] |
|
images[0].save("paimon_trans.png") |
|
``` |
|
|
|
### Creating a Grid Image |
|
|
|
You can also create a grid image from a list of PIL Image objects: |
|
|
|
```python |
|
from PIL import Image |
|
|
|
def create_grid_image(image_list, rows, cols, cell_width, cell_height): |
|
""" |
|
Create a grid image from a list of PIL Image objects. |
|
|
|
:param image_list: A list of PIL Image objects |
|
:param rows: Number of rows in the grid |
|
:param cols: Number of columns in the grid |
|
:param cell_width: Width of each cell in the grid |
|
:param cell_height: Height of each cell in the grid |
|
:return: The resulting grid image |
|
""" |
|
total_width = cols * cell_width |
|
total_height = rows * cell_height |
|
|
|
grid_image = Image.new('RGB', (total_width, total_height)) |
|
|
|
for i, img in enumerate(image_list): |
|
row = i // cols |
|
col = i % cols |
|
|
|
img = img.resize((cell_width, cell_height)) |
|
|
|
x_offset = col * cell_width |
|
y_offset = row * cell_height |
|
|
|
grid_image.paste(img, (x_offset, y_offset)) |
|
|
|
return grid_image |
|
|
|
create_grid_image([Image.open("zhongli-cb.jpg") ,Image.open("zhongli-cb-canny.jpg"), Image.open("zhongli_trans.png")], 1, 3, 512, 768) |
|
|
|
create_grid_image([Image.open("paimon-cb-crop.jpg") ,Image.open("paimon-cb-canny.jpg"), Image.open("paimon_trans.png")], 1, 3, 512, 768) |
|
``` |
|
|
|
This will create a grid image showing the original, Canny edge detection, and transformed images side by side. |
|
|
|
<div> |
|
<b><h3 style="text-align: center;">Below image list in : (Genshin Impact Toy/ Canny Image / Gemshin Impact Restore 2D Image)</h3></b> |
|
<div style="display: flex; flex-direction: column; align-items: center;"> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ckrKqytF5MhanjIc_Vn1q.png" width="1536" height="768"> |
|
<p style="text-align: center;">钟离</p> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/vfffGerUQV9W1MHxc_rN_.png" width="1536" height="768"> |
|
<p style="text-align: center;">派蒙</p> |
|
</div> |
|
</div> |
|
</div> |
|
|
|
### Generating an Animation of Zhongli |
|
Here's an example of how to generate an animation of Zhongli using the `AnimateDiffSDXLPipeline`: |
|
|
|
```python |
|
import torch |
|
from diffusers.models import MotionAdapter |
|
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler |
|
from diffusers.utils import export_to_gif |
|
|
|
adapter = MotionAdapter.from_pretrained( |
|
"a-r-r-o-w/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16 |
|
) |
|
|
|
model_id = "svjack/GenshinImpact_XL_Base" |
|
scheduler = DDIMScheduler.from_pretrained( |
|
model_id, |
|
subfolder="scheduler", |
|
clip_sample=False, |
|
timestep_spacing="linspace", |
|
beta_schedule="linear", |
|
steps_offset=1, |
|
) |
|
|
|
pipe = AnimateDiffSDXLPipeline.from_pretrained( |
|
model_id, |
|
motion_adapter=adapter, |
|
scheduler=scheduler, |
|
torch_dtype=torch.float16, |
|
).to("cuda") |
|
|
|
# enable memory savings |
|
pipe.enable_vae_slicing() |
|
pipe.enable_vae_tiling() |
|
|
|
output = pipe( |
|
prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward.", |
|
negative_prompt="low quality, worst quality", |
|
num_inference_steps=20, |
|
guidance_scale=8, |
|
width=1024, |
|
height=1024, |
|
num_frames=16, |
|
generator=torch.manual_seed(4), |
|
) |
|
frames = output.frames[0] |
|
export_to_gif(frames, "zhongli_animation.gif") |
|
|
|
from diffusers.utils import export_to_video |
|
export_to_video(frames, "zhongli_animation.mp4") |
|
from IPython import display |
|
display.Video("zhongli_animation.mp4", width=512, height=512) |
|
``` |
|
|
|
Use `AutoPipelineForImage2Image` to enhance output: |
|
|
|
```python |
|
from moviepy.editor import VideoFileClip |
|
from PIL import Image |
|
clip = VideoFileClip("zhongli_animation.mp4") |
|
frames = list(map(Image.fromarray ,clip.iter_frames())) |
|
|
|
from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image |
|
from diffusers.utils import load_image, make_image_grid |
|
import torch |
|
|
|
pipeline_text2image = AutoPipelineForText2Image.from_pretrained( |
|
"svjack/GenshinImpact_XL_Base", |
|
torch_dtype=torch.float16 |
|
) |
|
|
|
# use from_pipe to avoid consuming additional memory when loading a checkpoint |
|
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda") |
|
|
|
from tqdm import tqdm |
|
req = [] |
|
for init_image in tqdm(frames): |
|
prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward." |
|
image = pipeline(prompt, image=init_image, strength=0.8, guidance_scale=10.5).images[0] |
|
req.append(image) |
|
|
|
from diffusers.utils import export_to_video |
|
export_to_video(req, "zhongli_animation_im2im.mp4") |
|
from IPython import display |
|
display.Video("zhongli_animation_im2im.mp4", width=512, height=512) |
|
``` |
|
|
|
##### Enhancing Animation with RIFE |
|
To enhance the animation using RIFE (Real-Time Intermediate Flow Estimation): |
|
|
|
```bash |
|
git clone https://github.com/svjack/Practical-RIFE && cd Practical-RIFE && pip install -r requirements.txt |
|
python inference_video.py --multi=128 --video=../zhongli_animation_im2im.mp4 |
|
``` |
|
|
|
```python |
|
from moviepy.editor import VideoFileClip |
|
clip = VideoFileClip("zhongli_animation_im2im_128X_1280fps.mp4") |
|
|
|
def speed_change_video(video_clip, speed_factor, output_path): |
|
if speed_factor == 1: |
|
# 如果变速因子为1,直接复制原视频 |
|
video_clip.write_videofile(output_path, codec="libx264") |
|
else: |
|
# 否则,按变速因子调整视频速度 |
|
new_duration = video_clip.duration / speed_factor |
|
sped_up_clip = video_clip.speedx(speed_factor) |
|
sped_up_clip.write_videofile(output_path, codec="libx264") |
|
|
|
speed_change_video(clip, 0.05, "zhongli_animation_im2im_128X_1280fps_wrt.mp4") |
|
|
|
VideoFileClip("zhongli_animation_im2im_128X_1280fps_wrt.mp4").set_duration(10).write_videofile("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", codec="libx264") |
|
from IPython import display |
|
display.Video("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", width=512, height=512) |
|
``` |
|
|
|
##### Merging Videos Horizontally |
|
You can merge two videos horizontally using the following function: |
|
|
|
```python |
|
from moviepy.editor import VideoFileClip, CompositeVideoClip |
|
|
|
def merge_videos_horizontally(video_path1, video_path2, output_video_path): |
|
clip1 = VideoFileClip(video_path1) |
|
clip2 = VideoFileClip(video_path2) |
|
|
|
max_duration = max(clip1.duration, clip2.duration) |
|
|
|
if clip1.duration < max_duration: |
|
clip1 = clip1.loop(duration=max_duration) |
|
if clip2.duration < max_duration: |
|
clip2 = clip2.loop(duration=max_duration) |
|
|
|
total_width = clip1.w + clip2.w |
|
total_height = max(clip1.h, clip2.h) |
|
|
|
final_clip = CompositeVideoClip([ |
|
clip1.set_position(("left", "center")), |
|
clip2.set_position(("right", "center")) |
|
], size=(total_width, total_height)) |
|
|
|
final_clip.write_videofile(output_video_path, codec='libx264') |
|
|
|
print(f"Merged video saved to {output_video_path}") |
|
|
|
# Example usage |
|
video_path1 = "zhongli_animation.mp4" # 第一个视频文件路径 |
|
video_path2 = "zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4" # 第二个视频文件路径 |
|
output_video_path = "zhongli_inter_video_im2im_compare.mp4" # 输出视频的路径 |
|
merge_videos_horizontally(video_path1, video_path2, output_video_path) |
|
``` |
|
|
|
|
|
|
|
<div> |
|
<b><h3 style="text-align: center;">Left is zhongli_animation.mp4 (By AnimateDiffSDXLPipeline), Right is zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4 (By AutoPipelineForImage2Image + Practical-RIFE)</h3></b> |
|
<div style="display: flex; flex-direction: column; align-items: center;"> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/gBaodBk8z3aI69LiT36w2.mp4"></video> |
|
<p style="text-align: center;">钟离</p> |
|
</div> |
|
</div> |
|
</div> |
|
|
|
# Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs - ICML 2024 |
|
|
|
This repository contains the implementation of a cutting-edge text-to-image diffusion model that leverages multimodal large language models (LLMs) for advanced image generation. The project focuses on recaptioning, planning, and generating high-quality images from textual descriptions, showcasing the capabilities of modern AI in creative content production. |
|
|
|
## Installation |
|
|
|
To get started with the project, follow these steps to set up the environment and install the necessary dependencies: |
|
|
|
1. **Clone the Repository:** |
|
```bash |
|
git clone https://github.com/svjack/RPG-DiffusionMaster |
|
cd RPG-DiffusionMaster |
|
``` |
|
|
|
2. **Create and Activate Conda Environment:** |
|
```bash |
|
conda create -n RPG python==3.9 |
|
conda activate RPG |
|
``` |
|
|
|
3. **Install Jupyter Kernel:** |
|
```bash |
|
pip install ipykernel |
|
python -m ipykernel install --user --name RPG --display-name "RPG" |
|
``` |
|
|
|
4. **Install Required Packages:** |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
5. **Clone Diffusers Repository:** |
|
```bash |
|
git clone https://github.com/huggingface/diffusers |
|
``` |
|
|
|
## Demo |
|
|
|
This section provides a quick demonstration of how to use the `RegionalDiffusionXLPipeline` to generate images based on textual prompts. The example below demonstrates the process of generating an image using a multimodal LLM to split and refine the prompt. |
|
|
|
### Import Required Modules |
|
|
|
```python |
|
from RegionalDiffusion_base import RegionalDiffusionPipeline |
|
from RegionalDiffusion_xl import RegionalDiffusionXLPipeline |
|
from diffusers.schedulers import KarrasDiffusionSchedulers, DPMSolverMultistepScheduler |
|
from mllm import local_llm, GPT4, DeepSeek |
|
import torch |
|
``` |
|
|
|
### Load the Model and Configure Scheduler |
|
|
|
```python |
|
pipe = RegionalDiffusionXLPipeline.from_single_file( |
|
"https://huggingface.co/svjack/GenshinImpact_XL_Base/blob/main/sdxlBase_v10.safetensors", |
|
torch_dtype=torch.float16, use_safetensors=True, variant="fp16" |
|
) |
|
pipe.to("cuda") |
|
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True) |
|
pipe.enable_xformers_memory_efficient_attention() |
|
``` |
|
|
|
### User Input and MLLM Processing |
|
|
|
```python |
|
# User input prompt |
|
prompt = 'ZHONGLI(genshin impact) with NING GUANG(genshin impact) in red cheongsam in the bar.' |
|
|
|
# Process the prompt using DeepSeek MLLM |
|
para_dict = DeepSeek(prompt) |
|
|
|
# Extract parameters for image generation |
|
split_ratio = para_dict['Final split ratio'] |
|
regional_prompt = para_dict['Regional Prompt'] |
|
negative_prompt = "" # Optional negative prompt |
|
``` |
|
|
|
### Generate and Save the Image |
|
|
|
```python |
|
images = pipe( |
|
prompt=regional_prompt, |
|
split_ratio=split_ratio, # The ratio of the regional prompt |
|
batch_size=1, # Batch size |
|
base_ratio=0.5, # The ratio of the base prompt |
|
base_prompt=prompt, |
|
num_inference_steps=20, # Sampling steps |
|
height=1024, |
|
negative_prompt=negative_prompt, # Negative prompt |
|
width=1024, |
|
seed=0, # Random seed |
|
guidance_scale=7.0 |
|
).images[0] |
|
|
|
# Save the generated image |
|
images.save("test_zhong_ning.png") |
|
``` |
|
|
|
This demo showcases the power of combining text-to-image diffusion with multimodal LLMs to generate high-quality images from complex textual descriptions. The generated image is saved as `test_zhong_ning.png`. |
|
|
|
--- |
|
|
|
Feel free to explore the repository and experiment with different prompts and configurations to see the full potential of this advanced text-to-image generation model. |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ZJZkSaMOGRI7QM0uegeqS.png) |
|
|
|
## MotionCtrl and MasaCtrl: Genshin Impact Character Synthesis |
|
Check https://github.com/svjack/MasaCtrl to view example about Genshin Impact Character Synthesis video by MasaCtrl |
|
|
|
- **Zhongli Drinking Tea:** |
|
``` |
|
"solo,ZHONGLI(genshin impact),1boy,highres," -> "solo,ZHONGLI drink tea use chinese cup (genshin impact),1boy,highres," |
|
``` |
|
![Screenshot 2024-11-17 132742](https://github.com/user-attachments/assets/00451728-f2d5-4009-afa8-23baaabdc223) |
|
|
|
- **Kamisato Ayato Smiling:** |
|
``` |
|
"solo,KAMISATO AYATO(genshin impact),1boy,highres," -> "solo,KAMISATO AYATO smiling (genshin impact),1boy,highres," |
|
``` |
|
|
|
![Screenshot 2024-11-17 133421](https://github.com/user-attachments/assets/7a920f4c-8a3a-4387-98d6-381a798566ef) |
|
|
|
<div style="display: flex; flex-direction: column; align-items: center;"> |
|
<div style="margin-bottom: 10px;"> |
|
<h3>Zhongli Drinking Tea:</h3> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://github.com/user-attachments/assets/607e7eb7-d41c-4740-9c8a-8369c31487da" style="width: 1024px; height: 800px;"></video> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<h3>Kamisato Ayato Smiling:</h3> |
|
</div> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://github.com/user-attachments/assets/aaa9849e-0c53-4012-b6c3-9ceb9910f2f8" style="width: 1024px; height: 800px;"></video> |
|
</div> |
|
</div> |
|
|
|
|
|
## Perturbed-Attention-Guidance with Genshin Impact XL |
|
Here's an example of how to enhance Genshin Impact XL by [https://github.com/svjack/Perturbed-Attention-Guidance](https://github.com/svjack/Perturbed-Attention-Guidance): |
|
|
|
### Clone the Repository |
|
|
|
Next, clone the repository from Hugging Face: |
|
|
|
```bash |
|
git clone https://huggingface.co/spaces/svjack/perturbed-attention-guidance-genshin_impact_xl |
|
``` |
|
|
|
### Navigate to the Repository Directory |
|
|
|
Change into the cloned repository directory: |
|
|
|
```bash |
|
cd perturbed-attention-guidance-genshin_impact_xl |
|
``` |
|
|
|
### Install Python Requirements |
|
|
|
Install the required Python packages using `pip`: |
|
|
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
### Run the Application |
|
|
|
Finally, run the application: |
|
|
|
```bash |
|
python app.py |
|
``` |
|
|
|
<div> |
|
<b><h3 style="text-align: center;">Left Use BreadcrumbsPerturbed-Attention-Guidance |
|
, Right Original Genshin Impact XL</h3></b> |
|
<b><h4 style="text-align: center;">Left Seems more pretty </h4></b> |
|
<div style="display: flex; flex-direction: column; align-items: center;"> |
|
<div style="margin-bottom: 10px;"> |
|
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/Kmh4NJ1AkfV5X3-kkilAK.mp4"></video> |
|
</div> |
|
</div> |
|
</div> |
|
|
|
|
|
|