--- license: openrail++ datasets: - friedrichor/PhotoChat_120_square_HQ language: - en tags: - stable-diffusion - text-to-image --- This `friedrichor/stable-diffusion-2-1-realistic` model fine-tuned from [stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) with [friedrichor/PhotoChat_120_square_HQ](https://huggingface.co/datasets/friedrichor/PhotoChat_120_square_HQ) This model is not trained solely for Text-to-Image tasks, but as a part of the *Tiger*(currently not open-source and submission) model for Multimodal Dialogue Response Generation. # Model Details - **Model type:** Diffusion-based text-to-image generation model - **Language(s):** English - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL) - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([OpenCLIP-ViT/H](https://github.com/mlfoundations/open_clip)). ## Dataset [friedrichor/PhotoChat_120_square_HQ](https://huggingface.co/datasets/friedrichor/PhotoChat_120_square_HQ) was used for fine-tuning Stable Diffusion v2.1. 120 image-text pairs Images were manually screened from the [PhotoChat](https://aclanthology.org/2021.acl-long.479/) dataset, cropped to square, and `Gigapixel` was used to improve the quality. Image captions are generated by [BLIP-2](https://arxiv.org/abs/2301.12597). ## How to fine-tuning [friedrichor/Text-to-Image-Summary/fine-tune/text2image](https://github.com/friedrichor/Text-to-Image-Summary/tree/main/fine-tune/text2image) or [Hugging Face diffusers](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) # Simple use example Using the [🤗's Diffusers library](https://github.com/huggingface/diffusers) ```python import torch from diffusers import StableDiffusionPipeline device = "cuda:0" pipe = StableDiffusionPipeline.from_pretrained("friedrichor/stable-diffusion-2-1-realistic", torch_dtype=torch.float32) pipe.to(device) prompt = "a woman in a red and gold costume with feathers on her head" extra_prompt = ", facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography" negative_prompt = "cartoon, anime, ugly, (aged, white beard, black skin, wrinkle:1.1), (bad proportions, unnatural feature, incongruous feature:1.4), (blurry, un-sharp, fuzzy, un-detailed skin:1.2), (facial contortion, poorly drawn face, deformed iris, deformed pupils:1.3), (mutated hands and fingers:1.5), disconnected hands, disconnected limbs" generator = torch.Generator(device=device).manual_seed(42) image = pipe(prompt + extra_prompt, negative_prompt=negative_prompt, height=768, width=768, num_inference_steps=20, guidance_scale=7.5, generator=generator).images[0] image.save("image.png") ``` ## Prompt template **Applying prompt templates is helpful for improving image quality** If you want to generate images with human in the real world, you can try the following prompt template. ` {{caption}}, facing the camera, photograph, highly detailed face, depth of field, moody light, style by Yasmin Albatoul, Harry Fayt, centered, extremely detailed, Nikon D850, award winning photography `