svjack/GenshinImpact_XL_Base
This model is derived from CivitAI.
Acknowledgments
Special thanks to mobeimunan for their contributions to the development of this model.
Zhongli Drinking Tea:
Kamisato Ayato Smiling:
Supported Characters
The model currently supports the following 73 characters from Genshin Impact:
name_dict = {
'旅行者女': 'lumine',
'旅行者男': 'aether',
'派蒙': 'PAIMON',
'迪奥娜': 'DIONA',
'菲米尼': 'FREMINET',
'甘雨': 'GANYU',
'凯亚': 'KAEYA',
'莱依拉': 'LAYLA',
'罗莎莉亚': 'ROSARIA',
'七七': 'QIQI',
'申鹤': 'SHENHE',
'神里绫华': 'KAMISATO AYAKA',
'优菈': 'EULA',
'重云': 'CHONGYUN',
'夏洛蒂': 'charlotte',
'莱欧斯利': 'WRIOTHESLEY',
'艾尔海森': 'ALHAITHAM',
'柯莱': 'COLLEI',
'纳西妲': 'NAHIDA',
'绮良良': 'KIRARA',
'提纳里': 'TIGHNARI',
'瑶瑶': 'YAOYAO',
'珐露珊': 'FARUZAN',
'枫原万叶': 'KAEDEHARA KAZUHA',
'琳妮特': 'LYNETTE',
'流浪者 散兵': 'scaramouche',
'鹿野院平藏': 'SHIKANOIN HEIZOU',
'琴': 'JEAN',
'砂糖': 'SUCROSE',
'温迪': 'VENTI',
'魈': 'XIAO',
'早柚': 'SAYU',
'安柏': 'AMBER',
'班尼特': 'BENNETT',
'迪卢克': 'DILUC',
'迪西娅': 'DEHYA',
'胡桃': 'HU TAO',
'可莉': 'KLEE',
'林尼': 'LYNEY',
'托马': 'THOMA',
'香菱': 'XIANG LING',
'宵宫': 'YOIMIYA',
'辛焱': 'XINYAN',
'烟绯': 'YANFEI',
'八重神子': 'YAE MIKO',
'北斗': 'BEIDOU',
'菲谢尔': 'FISCHL',
'九条裟罗': 'KUJO SARA',
'久岐忍': 'KUKI SHINOBU',
'刻晴': 'KEQING',
'雷电将军': 'RAIDEN SHOGUN',
'雷泽': 'RAZOR',
'丽莎': 'LISA',
'赛诺': 'CYNO',
'芙宁娜': 'FURINA',
'芭芭拉': 'BARBARA',
'公子 达达利亚': 'TARTAGLIA',
'坎蒂丝': 'CANDACE',
'莫娜': 'MONA',
'妮露': 'NILOU',
'珊瑚宫心海': 'SANGONOMIYA KOKOMI',
'神里绫人': 'KAMISATO AYATO',
'行秋': 'XINGQIU',
'夜兰': 'YELAN',
'那维莱特': 'NEUVILLETTE',
'娜维娅': 'NAVIA',
'阿贝多': 'ALBEDO',
'荒泷一斗': 'ARATAKI ITTO',
'凝光': 'NING GUANG',
'诺艾尔': 'NOELLE',
'五郎': 'GOROU',
'云堇': 'YUN JIN',
'钟离': 'ZHONGLI'
}
Installation
To use this model, you need to install the following dependencies:
sudo apt-get update && sudo apt-get install git-lfs ffmpeg cbm
pip install -U diffusers transformers sentencepiece peft controlnet-aux moviepy
Example Usage
Generating an Image of Zhongli
Here's an example of how to generate an image of Zhongli using this model:
from diffusers import StableDiffusionXLPipeline
import torch
pipeline = StableDiffusionXLPipeline.from_pretrained(
"svjack/GenshinImpact_XL_Base",
torch_dtype=torch.float16
).to("cuda")
prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres,"
negative_prompt = "nsfw,lowres,(bad),text,error,fewer,extra,missing,worst quality,jpeg artifacts,low quality,watermark,unfinished,displeasing,oldest,early,chromatic aberration,signature,extra digits,artistic error,username,scan,[abstract],"
image = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
generator=torch.manual_seed(0),
).images[0]
image
image.save("zhongli_1024x1024.png")
钟离
Using Canny ControlNet to Restore 2D Images from 3D Toy Photos
Here's an example of how to use Canny ControlNet to restore 2D images from 3D toy photos:
Genshin Impact 3D Toys
钟离
派蒙
from diffusers import AutoPipelineForText2Image, ControlNetModel
from diffusers.utils import load_image
import torch
from PIL import Image
from controlnet_aux import CannyDetector
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16
)
pipeline = AutoPipelineForText2Image.from_pretrained(
"svjack/GenshinImpact_XL_Base",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
#pipeline.enable_model_cpu_offload()
canny = CannyDetector()
canny(Image.open("zhongli-cb.jpg")).save("zhongli-cb-canny.jpg")
canny_image = load_image(
"zhongli-cb-canny.jpg"
)
controlnet_conditioning_scale = 0.5
generator = torch.Generator(device="cpu").manual_seed(1)
images = pipeline(
prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,highres",
controlnet_conditioning_scale=controlnet_conditioning_scale,
image=canny_image,
num_inference_steps=50,
guidance_scale=7.0,
generator=generator,
).images
images[0]
images[0].save("zhongli_trans.png")
canny = CannyDetector()
canny(Image.open("paimon-cb-crop.jpg")).save("paimon-cb-canny.jpg")
canny_image = load_image(
"paimon-cb-canny.jpg"
)
controlnet_conditioning_scale = 0.7
generator = torch.Generator(device="cpu").manual_seed(3)
images = pipeline(
prompt="solo,PAIMON\(genshin impact\),1girl,portrait,highres, bright, shiny, high detail, anime",
controlnet_conditioning_scale=controlnet_conditioning_scale,
image=canny_image,
num_inference_steps=50,
guidance_scale=8.0,
generator=generator,
).images
images[0]
images[0].save("paimon_trans.png")
Creating a Grid Image
You can also create a grid image from a list of PIL Image objects:
from PIL import Image
def create_grid_image(image_list, rows, cols, cell_width, cell_height):
"""
Create a grid image from a list of PIL Image objects.
:param image_list: A list of PIL Image objects
:param rows: Number of rows in the grid
:param cols: Number of columns in the grid
:param cell_width: Width of each cell in the grid
:param cell_height: Height of each cell in the grid
:return: The resulting grid image
"""
total_width = cols * cell_width
total_height = rows * cell_height
grid_image = Image.new('RGB', (total_width, total_height))
for i, img in enumerate(image_list):
row = i // cols
col = i % cols
img = img.resize((cell_width, cell_height))
x_offset = col * cell_width
y_offset = row * cell_height
grid_image.paste(img, (x_offset, y_offset))
return grid_image
create_grid_image([Image.open("zhongli-cb.jpg") ,Image.open("zhongli-cb-canny.jpg"), Image.open("zhongli_trans.png")], 1, 3, 512, 768)
create_grid_image([Image.open("paimon-cb-crop.jpg") ,Image.open("paimon-cb-canny.jpg"), Image.open("paimon_trans.png")], 1, 3, 512, 768)
This will create a grid image showing the original, Canny edge detection, and transformed images side by side.
Below image list in : (Genshin Impact Toy/ Canny Image / Gemshin Impact Restore 2D Image)
钟离
派蒙
Generating an Animation of Zhongli
Here's an example of how to generate an animation of Zhongli using the AnimateDiffSDXLPipeline
:
import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained(
"a-r-r-o-w/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16
)
model_id = "svjack/GenshinImpact_XL_Base"
scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe = AnimateDiffSDXLPipeline.from_pretrained(
model_id,
motion_adapter=adapter,
scheduler=scheduler,
torch_dtype=torch.float16,
).to("cuda")
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
output = pipe(
prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward.",
negative_prompt="low quality, worst quality",
num_inference_steps=20,
guidance_scale=8,
width=1024,
height=1024,
num_frames=16,
generator=torch.manual_seed(4),
)
frames = output.frames[0]
export_to_gif(frames, "zhongli_animation.gif")
from diffusers.utils import export_to_video
export_to_video(frames, "zhongli_animation.mp4")
from IPython import display
display.Video("zhongli_animation.mp4", width=512, height=512)
Use AutoPipelineForImage2Image
to enhance output:
from moviepy.editor import VideoFileClip
from PIL import Image
clip = VideoFileClip("zhongli_animation.mp4")
frames = list(map(Image.fromarray ,clip.iter_frames()))
from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image
from diffusers.utils import load_image, make_image_grid
import torch
pipeline_text2image = AutoPipelineForText2Image.from_pretrained(
"svjack/GenshinImpact_XL_Base",
torch_dtype=torch.float16
)
# use from_pipe to avoid consuming additional memory when loading a checkpoint
pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda")
from tqdm import tqdm
req = []
for init_image in tqdm(frames):
prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward."
image = pipeline(prompt, image=init_image, strength=0.8, guidance_scale=10.5).images[0]
req.append(image)
from diffusers.utils import export_to_video
export_to_video(req, "zhongli_animation_im2im.mp4")
from IPython import display
display.Video("zhongli_animation_im2im.mp4", width=512, height=512)
Enhancing Animation with RIFE
To enhance the animation using RIFE (Real-Time Intermediate Flow Estimation):
git clone https://github.com/svjack/Practical-RIFE && cd Practical-RIFE && pip install -r requirements.txt
python inference_video.py --multi=128 --video=../zhongli_animation_im2im.mp4
from moviepy.editor import VideoFileClip
clip = VideoFileClip("zhongli_animation_im2im_128X_1280fps.mp4")
def speed_change_video(video_clip, speed_factor, output_path):
if speed_factor == 1:
# 如果变速因子为1,直接复制原视频
video_clip.write_videofile(output_path, codec="libx264")
else:
# 否则,按变速因子调整视频速度
new_duration = video_clip.duration / speed_factor
sped_up_clip = video_clip.speedx(speed_factor)
sped_up_clip.write_videofile(output_path, codec="libx264")
speed_change_video(clip, 0.05, "zhongli_animation_im2im_128X_1280fps_wrt.mp4")
VideoFileClip("zhongli_animation_im2im_128X_1280fps_wrt.mp4").set_duration(10).write_videofile("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", codec="libx264")
from IPython import display
display.Video("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", width=512, height=512)
Merging Videos Horizontally
You can merge two videos horizontally using the following function:
from moviepy.editor import VideoFileClip, CompositeVideoClip
def merge_videos_horizontally(video_path1, video_path2, output_video_path):
clip1 = VideoFileClip(video_path1)
clip2 = VideoFileClip(video_path2)
max_duration = max(clip1.duration, clip2.duration)
if clip1.duration < max_duration:
clip1 = clip1.loop(duration=max_duration)
if clip2.duration < max_duration:
clip2 = clip2.loop(duration=max_duration)
total_width = clip1.w + clip2.w
total_height = max(clip1.h, clip2.h)
final_clip = CompositeVideoClip([
clip1.set_position(("left", "center")),
clip2.set_position(("right", "center"))
], size=(total_width, total_height))
final_clip.write_videofile(output_video_path, codec='libx264')
print(f"Merged video saved to {output_video_path}")
# Example usage
video_path1 = "zhongli_animation.mp4" # 第一个视频文件路径
video_path2 = "zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4" # 第二个视频文件路径
output_video_path = "zhongli_inter_video_im2im_compare.mp4" # 输出视频的路径
merge_videos_horizontally(video_path1, video_path2, output_video_path)
Left is zhongli_animation.mp4 (By AnimateDiffSDXLPipeline), Right is zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4 (By AutoPipelineForImage2Image + Practical-RIFE)
钟离
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs - ICML 2024
This repository contains the implementation of a cutting-edge text-to-image diffusion model that leverages multimodal large language models (LLMs) for advanced image generation. The project focuses on recaptioning, planning, and generating high-quality images from textual descriptions, showcasing the capabilities of modern AI in creative content production.
Installation
To get started with the project, follow these steps to set up the environment and install the necessary dependencies:
Clone the Repository:
git clone https://github.com/svjack/RPG-DiffusionMaster cd RPG-DiffusionMaster
Create and Activate Conda Environment:
conda create -n RPG python==3.9 conda activate RPG
Install Jupyter Kernel:
pip install ipykernel python -m ipykernel install --user --name RPG --display-name "RPG"
Install Required Packages:
pip install -r requirements.txt
Clone Diffusers Repository:
git clone https://github.com/huggingface/diffusers
Demo
This section provides a quick demonstration of how to use the RegionalDiffusionXLPipeline
to generate images based on textual prompts. The example below demonstrates the process of generating an image using a multimodal LLM to split and refine the prompt.
Import Required Modules
from RegionalDiffusion_base import RegionalDiffusionPipeline
from RegionalDiffusion_xl import RegionalDiffusionXLPipeline
from diffusers.schedulers import KarrasDiffusionSchedulers, DPMSolverMultistepScheduler
from mllm import local_llm, GPT4, DeepSeek
import torch
Load the Model and Configure Scheduler
pipe = RegionalDiffusionXLPipeline.from_single_file(
"https://huggingface.co/svjack/GenshinImpact_XL_Base/blob/main/sdxlBase_v10.safetensors",
torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
pipe.enable_xformers_memory_efficient_attention()
User Input and MLLM Processing
# User input prompt
prompt = 'ZHONGLI(genshin impact) with NING GUANG(genshin impact) in red cheongsam in the bar.'
# Process the prompt using DeepSeek MLLM
para_dict = DeepSeek(prompt)
# Extract parameters for image generation
split_ratio = para_dict['Final split ratio']
regional_prompt = para_dict['Regional Prompt']
negative_prompt = "" # Optional negative prompt
Generate and Save the Image
images = pipe(
prompt=regional_prompt,
split_ratio=split_ratio, # The ratio of the regional prompt
batch_size=1, # Batch size
base_ratio=0.5, # The ratio of the base prompt
base_prompt=prompt,
num_inference_steps=20, # Sampling steps
height=1024,
negative_prompt=negative_prompt, # Negative prompt
width=1024,
seed=0, # Random seed
guidance_scale=7.0
).images[0]
# Save the generated image
images.save("test_zhong_ning.png")
This demo showcases the power of combining text-to-image diffusion with multimodal LLMs to generate high-quality images from complex textual descriptions. The generated image is saved as test_zhong_ning.png
.
Feel free to explore the repository and experiment with different prompts and configurations to see the full potential of this advanced text-to-image generation model.
MotionCtrl and MasaCtrl: Genshin Impact Character Synthesis
Check https://github.com/svjack/MasaCtrl to view example about Genshin Impact Character Synthesis video by MasaCtrl
- Zhongli Drinking Tea:
"solo,ZHONGLI(genshin impact),1boy,highres," -> "solo,ZHONGLI drink tea use chinese cup (genshin impact),1boy,highres,"
- Kamisato Ayato Smiling:
"solo,KAMISATO AYATO(genshin impact),1boy,highres," -> "solo,KAMISATO AYATO smiling (genshin impact),1boy,highres,"
Zhongli Drinking Tea:
Kamisato Ayato Smiling:
Perturbed-Attention-Guidance with Genshin Impact XL
Here's an example of how to enhance Genshin Impact XL by https://github.com/svjack/Perturbed-Attention-Guidance:
Clone the Repository
Next, clone the repository from Hugging Face:
git clone https://huggingface.co/spaces/svjack/perturbed-attention-guidance-genshin_impact_xl
Navigate to the Repository Directory
Change into the cloned repository directory:
cd perturbed-attention-guidance-genshin_impact_xl
Install Python Requirements
Install the required Python packages using pip
:
pip install -r requirements.txt
Run the Application
Finally, run the application:
python app.py