# svjack/GenshinImpact_XL_Base This model is derived from [CivitAI](https://civitai.com/models/386505). ## Acknowledgments Special thanks to [mobeimunan](https://civitai.com/user/mobeimunan) for their contributions to the development of this model.

Zhongli Drinking Tea:

Kamisato Ayato Smiling:

## Supported Characters The model currently supports the following 73 characters from Genshin Impact: ```python name_dict = { '旅行者女': 'lumine', '旅行者男': 'aether', '派蒙': 'PAIMON', '迪奥娜': 'DIONA', '菲米尼': 'FREMINET', '甘雨': 'GANYU', '凯亚': 'KAEYA', '莱依拉': 'LAYLA', '罗莎莉亚': 'ROSARIA', '七七': 'QIQI', '申鹤': 'SHENHE', '神里绫华': 'KAMISATO AYAKA', '优菈': 'EULA', '重云': 'CHONGYUN', '夏洛蒂': 'charlotte', '莱欧斯利': 'WRIOTHESLEY', '艾尔海森': 'ALHAITHAM', '柯莱': 'COLLEI', '纳西妲': 'NAHIDA', '绮良良': 'KIRARA', '提纳里': 'TIGHNARI', '瑶瑶': 'YAOYAO', '珐露珊': 'FARUZAN', '枫原万叶': 'KAEDEHARA KAZUHA', '琳妮特': 'LYNETTE', '流浪者 散兵': 'scaramouche', '鹿野院平藏': 'SHIKANOIN HEIZOU', '琴': 'JEAN', '砂糖': 'SUCROSE', '温迪': 'VENTI', '魈': 'XIAO', '早柚': 'SAYU', '安柏': 'AMBER', '班尼特': 'BENNETT', '迪卢克': 'DILUC', '迪西娅': 'DEHYA', '胡桃': 'HU TAO', '可莉': 'KLEE', '林尼': 'LYNEY', '托马': 'THOMA', '香菱': 'XIANG LING', '宵宫': 'YOIMIYA', '辛焱': 'XINYAN', '烟绯': 'YANFEI', '八重神子': 'YAE MIKO', '北斗': 'BEIDOU', '菲谢尔': 'FISCHL', '九条裟罗': 'KUJO SARA', '久岐忍': 'KUKI SHINOBU', '刻晴': 'KEQING', '雷电将军': 'RAIDEN SHOGUN', '雷泽': 'RAZOR', '丽莎': 'LISA', '赛诺': 'CYNO', '芙宁娜': 'FURINA', '芭芭拉': 'BARBARA', '公子 达达利亚': 'TARTAGLIA', '坎蒂丝': 'CANDACE', '莫娜': 'MONA', '妮露': 'NILOU', '珊瑚宫心海': 'SANGONOMIYA KOKOMI', '神里绫人': 'KAMISATO AYATO', '行秋': 'XINGQIU', '夜兰': 'YELAN', '那维莱特': 'NEUVILLETTE', '娜维娅': 'NAVIA', '阿贝多': 'ALBEDO', '荒泷一斗': 'ARATAKI ITTO', '凝光': 'NING GUANG', '诺艾尔': 'NOELLE', '五郎': 'GOROU', '云堇': 'YUN JIN', '钟离': 'ZHONGLI' } ``` ## Installation To use this model, you need to install the following dependencies: ```bash sudo apt-get update && sudo apt-get install git-lfs ffmpeg cbm pip install -U diffusers transformers sentencepiece peft controlnet-aux moviepy ``` ## Example Usage ### Generating an Image of Zhongli Here's an example of how to generate an image of Zhongli using this model: ```python from diffusers import StableDiffusionXLPipeline import torch pipeline = StableDiffusionXLPipeline.from_pretrained( "svjack/GenshinImpact_XL_Base", torch_dtype=torch.float16 ).to("cuda") prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres," negative_prompt = "nsfw,lowres,(bad),text,error,fewer,extra,missing,worst quality,jpeg artifacts,low quality,watermark,unfinished,displeasing,oldest,early,chromatic aberration,signature,extra digits,artistic error,username,scan,[abstract]," image = pipeline( prompt=prompt, negative_prompt=negative_prompt, generator=torch.manual_seed(0), ).images[0] image image.save("zhongli_1024x1024.png") ```

钟离

### Using Canny ControlNet to Restore 2D Images from 3D Toy Photos Here's an example of how to use Canny ControlNet to restore 2D images from 3D toy photos: #### Genshin Impact 3D Toys

钟离

派蒙

```python from diffusers import AutoPipelineForText2Image, ControlNetModel from diffusers.utils import load_image import torch from PIL import Image from controlnet_aux import CannyDetector controlnet = ControlNetModel.from_pretrained( "diffusers/controlnet-canny-sdxl-1.0", torch_dtype=torch.float16 ) pipeline = AutoPipelineForText2Image.from_pretrained( "svjack/GenshinImpact_XL_Base", controlnet=controlnet, torch_dtype=torch.float16 ).to("cuda") #pipeline.enable_model_cpu_offload() canny = CannyDetector() canny(Image.open("zhongli-cb.jpg")).save("zhongli-cb-canny.jpg") canny_image = load_image( "zhongli-cb-canny.jpg" ) controlnet_conditioning_scale = 0.5 generator = torch.Generator(device="cpu").manual_seed(1) images = pipeline( prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,highres", controlnet_conditioning_scale=controlnet_conditioning_scale, image=canny_image, num_inference_steps=50, guidance_scale=7.0, generator=generator, ).images images[0] images[0].save("zhongli_trans.png") canny = CannyDetector() canny(Image.open("paimon-cb-crop.jpg")).save("paimon-cb-canny.jpg") canny_image = load_image( "paimon-cb-canny.jpg" ) controlnet_conditioning_scale = 0.7 generator = torch.Generator(device="cpu").manual_seed(3) images = pipeline( prompt="solo,PAIMON\(genshin impact\),1girl,portrait,highres, bright, shiny, high detail, anime", controlnet_conditioning_scale=controlnet_conditioning_scale, image=canny_image, num_inference_steps=50, guidance_scale=8.0, generator=generator, ).images images[0] images[0].save("paimon_trans.png") ``` ### Creating a Grid Image You can also create a grid image from a list of PIL Image objects: ```python from PIL import Image def create_grid_image(image_list, rows, cols, cell_width, cell_height): """ Create a grid image from a list of PIL Image objects. :param image_list: A list of PIL Image objects :param rows: Number of rows in the grid :param cols: Number of columns in the grid :param cell_width: Width of each cell in the grid :param cell_height: Height of each cell in the grid :return: The resulting grid image """ total_width = cols * cell_width total_height = rows * cell_height grid_image = Image.new('RGB', (total_width, total_height)) for i, img in enumerate(image_list): row = i // cols col = i % cols img = img.resize((cell_width, cell_height)) x_offset = col * cell_width y_offset = row * cell_height grid_image.paste(img, (x_offset, y_offset)) return grid_image create_grid_image([Image.open("zhongli-cb.jpg") ,Image.open("zhongli-cb-canny.jpg"), Image.open("zhongli_trans.png")], 1, 3, 512, 768) create_grid_image([Image.open("paimon-cb-crop.jpg") ,Image.open("paimon-cb-canny.jpg"), Image.open("paimon_trans.png")], 1, 3, 512, 768) ``` This will create a grid image showing the original, Canny edge detection, and transformed images side by side.

Below image list in : (Genshin Impact Toy/ Canny Image / Gemshin Impact Restore 2D Image)

钟离

派蒙

### Generating an Animation of Zhongli Here's an example of how to generate an animation of Zhongli using the `AnimateDiffSDXLPipeline`: ```python import torch from diffusers.models import MotionAdapter from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler from diffusers.utils import export_to_gif adapter = MotionAdapter.from_pretrained( "a-r-r-o-w/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16 ) model_id = "svjack/GenshinImpact_XL_Base" scheduler = DDIMScheduler.from_pretrained( model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", beta_schedule="linear", steps_offset=1, ) pipe = AnimateDiffSDXLPipeline.from_pretrained( model_id, motion_adapter=adapter, scheduler=scheduler, torch_dtype=torch.float16, ).to("cuda") # enable memory savings pipe.enable_vae_slicing() pipe.enable_vae_tiling() output = pipe( prompt="solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward.", negative_prompt="low quality, worst quality", num_inference_steps=20, guidance_scale=8, width=1024, height=1024, num_frames=16, generator=torch.manual_seed(4), ) frames = output.frames[0] export_to_gif(frames, "zhongli_animation.gif") from diffusers.utils import export_to_video export_to_video(frames, "zhongli_animation.mp4") from IPython import display display.Video("zhongli_animation.mp4", width=512, height=512) ``` Use `AutoPipelineForImage2Image` to enhance output: ```python from moviepy.editor import VideoFileClip from PIL import Image clip = VideoFileClip("zhongli_animation.mp4") frames = list(map(Image.fromarray ,clip.iter_frames())) from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image from diffusers.utils import load_image, make_image_grid import torch pipeline_text2image = AutoPipelineForText2Image.from_pretrained( "svjack/GenshinImpact_XL_Base", torch_dtype=torch.float16 ) # use from_pipe to avoid consuming additional memory when loading a checkpoint pipeline = AutoPipelineForImage2Image.from_pipe(pipeline_text2image).to("cuda") from tqdm import tqdm req = [] for init_image in tqdm(frames): prompt = "solo,ZHONGLI\(genshin impact\),1boy,portrait,upper_body,highres, keep eyes forward." image = pipeline(prompt, image=init_image, strength=0.8, guidance_scale=10.5).images[0] req.append(image) from diffusers.utils import export_to_video export_to_video(req, "zhongli_animation_im2im.mp4") from IPython import display display.Video("zhongli_animation_im2im.mp4", width=512, height=512) ``` ##### Enhancing Animation with RIFE To enhance the animation using RIFE (Real-Time Intermediate Flow Estimation): ```bash git clone https://github.com/svjack/Practical-RIFE && cd Practical-RIFE && pip install -r requirements.txt python inference_video.py --multi=128 --video=../zhongli_animation_im2im.mp4 ``` ```python from moviepy.editor import VideoFileClip clip = VideoFileClip("zhongli_animation_im2im_128X_1280fps.mp4") def speed_change_video(video_clip, speed_factor, output_path): if speed_factor == 1: # 如果变速因子为1,直接复制原视频 video_clip.write_videofile(output_path, codec="libx264") else: # 否则,按变速因子调整视频速度 new_duration = video_clip.duration / speed_factor sped_up_clip = video_clip.speedx(speed_factor) sped_up_clip.write_videofile(output_path, codec="libx264") speed_change_video(clip, 0.05, "zhongli_animation_im2im_128X_1280fps_wrt.mp4") VideoFileClip("zhongli_animation_im2im_128X_1280fps_wrt.mp4").set_duration(10).write_videofile("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", codec="libx264") from IPython import display display.Video("zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4", width=512, height=512) ``` ##### Merging Videos Horizontally You can merge two videos horizontally using the following function: ```python from moviepy.editor import VideoFileClip, CompositeVideoClip def merge_videos_horizontally(video_path1, video_path2, output_video_path): clip1 = VideoFileClip(video_path1) clip2 = VideoFileClip(video_path2) max_duration = max(clip1.duration, clip2.duration) if clip1.duration < max_duration: clip1 = clip1.loop(duration=max_duration) if clip2.duration < max_duration: clip2 = clip2.loop(duration=max_duration) total_width = clip1.w + clip2.w total_height = max(clip1.h, clip2.h) final_clip = CompositeVideoClip([ clip1.set_position(("left", "center")), clip2.set_position(("right", "center")) ], size=(total_width, total_height)) final_clip.write_videofile(output_video_path, codec='libx264') print(f"Merged video saved to {output_video_path}") # Example usage video_path1 = "zhongli_animation.mp4" # 第一个视频文件路径 video_path2 = "zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4" # 第二个视频文件路径 output_video_path = "zhongli_inter_video_im2im_compare.mp4" # 输出视频的路径 merge_videos_horizontally(video_path1, video_path2, output_video_path) ```

Left is zhongli_animation.mp4 (By AnimateDiffSDXLPipeline), Right is zhongli_animation_im2im_128X_1280fps_wrt_10s.mp4 (By AutoPipelineForImage2Image + Practical-RIFE)

钟离

# Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs - ICML 2024 This repository contains the implementation of a cutting-edge text-to-image diffusion model that leverages multimodal large language models (LLMs) for advanced image generation. The project focuses on recaptioning, planning, and generating high-quality images from textual descriptions, showcasing the capabilities of modern AI in creative content production. ## Installation To get started with the project, follow these steps to set up the environment and install the necessary dependencies: 1. **Clone the Repository:** ```bash git clone https://github.com/svjack/RPG-DiffusionMaster cd RPG-DiffusionMaster ``` 2. **Create and Activate Conda Environment:** ```bash conda create -n RPG python==3.9 conda activate RPG ``` 3. **Install Jupyter Kernel:** ```bash pip install ipykernel python -m ipykernel install --user --name RPG --display-name "RPG" ``` 4. **Install Required Packages:** ```bash pip install -r requirements.txt ``` 5. **Clone Diffusers Repository:** ```bash git clone https://github.com/huggingface/diffusers ``` ## Demo This section provides a quick demonstration of how to use the `RegionalDiffusionXLPipeline` to generate images based on textual prompts. The example below demonstrates the process of generating an image using a multimodal LLM to split and refine the prompt. ### Import Required Modules ```python from RegionalDiffusion_base import RegionalDiffusionPipeline from RegionalDiffusion_xl import RegionalDiffusionXLPipeline from diffusers.schedulers import KarrasDiffusionSchedulers, DPMSolverMultistepScheduler from mllm import local_llm, GPT4, DeepSeek import torch ``` ### Load the Model and Configure Scheduler ```python pipe = RegionalDiffusionXLPipeline.from_single_file( "https://huggingface.co/svjack/GenshinImpact_XL_Base/blob/main/sdxlBase_v10.safetensors", torch_dtype=torch.float16, use_safetensors=True, variant="fp16" ) pipe.to("cuda") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True) pipe.enable_xformers_memory_efficient_attention() ``` ### User Input and MLLM Processing ```python # User input prompt prompt = 'ZHONGLI(genshin impact) with NING GUANG(genshin impact) in red cheongsam in the bar.' # Process the prompt using DeepSeek MLLM para_dict = DeepSeek(prompt) # Extract parameters for image generation split_ratio = para_dict['Final split ratio'] regional_prompt = para_dict['Regional Prompt'] negative_prompt = "" # Optional negative prompt ``` ### Generate and Save the Image ```python images = pipe( prompt=regional_prompt, split_ratio=split_ratio, # The ratio of the regional prompt batch_size=1, # Batch size base_ratio=0.5, # The ratio of the base prompt base_prompt=prompt, num_inference_steps=20, # Sampling steps height=1024, negative_prompt=negative_prompt, # Negative prompt width=1024, seed=0, # Random seed guidance_scale=7.0 ).images[0] # Save the generated image images.save("test_zhong_ning.png") ``` This demo showcases the power of combining text-to-image diffusion with multimodal LLMs to generate high-quality images from complex textual descriptions. The generated image is saved as `test_zhong_ning.png`. --- Feel free to explore the repository and experiment with different prompts and configurations to see the full potential of this advanced text-to-image generation model. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/634dffc49b777beec3bc6448/ZJZkSaMOGRI7QM0uegeqS.png) ## MotionCtrl and MasaCtrl: Genshin Impact Character Synthesis Check https://github.com/svjack/MasaCtrl to view example about Genshin Impact Character Synthesis video by MasaCtrl - **Zhongli Drinking Tea:** ``` "solo,ZHONGLI(genshin impact),1boy,highres," -> "solo,ZHONGLI drink tea use chinese cup (genshin impact),1boy,highres," ``` ![Screenshot 2024-11-17 132742](https://github.com/user-attachments/assets/00451728-f2d5-4009-afa8-23baaabdc223) - **Kamisato Ayato Smiling:** ``` "solo,KAMISATO AYATO(genshin impact),1boy,highres," -> "solo,KAMISATO AYATO smiling (genshin impact),1boy,highres," ``` ![Screenshot 2024-11-17 133421](https://github.com/user-attachments/assets/7a920f4c-8a3a-4387-98d6-381a798566ef)

Zhongli Drinking Tea:

Kamisato Ayato Smiling:

## Perturbed-Attention-Guidance with Genshin Impact XL Here's an example of how to enhance Genshin Impact XL by [https://github.com/svjack/Perturbed-Attention-Guidance](https://github.com/svjack/Perturbed-Attention-Guidance): ### Clone the Repository Next, clone the repository from Hugging Face: ```bash git clone https://huggingface.co/spaces/svjack/perturbed-attention-guidance-genshin_impact_xl ``` ### Navigate to the Repository Directory Change into the cloned repository directory: ```bash cd perturbed-attention-guidance-genshin_impact_xl ``` ### Install Python Requirements Install the required Python packages using `pip`: ```bash pip install -r requirements.txt ``` ### Run the Application Finally, run the application: ```bash python app.py ```

Left Use BreadcrumbsPerturbed-Attention-Guidance , Right Original Genshin Impact XL

Left Seems more pretty