Shekswess commited on
Commit
eaa71a0
1 Parent(s): 5d8d3fe

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail++
3
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
4
+ tags:
5
+ - stable-diffusion-xl
6
+ - stable-diffusion-xl-diffusers
7
+ - text-to-image
8
+ - diffusers
9
+ - inpainting
10
+ - neuron
11
+ inference: false
12
+ ---
13
+
14
+ # SD-XL Inpainting 0.1 Model Card
15
+
16
+ ![inpaint-example](inpaint-examples-min.png)
17
+
18
+ SD-XL Inpainting 0.1 is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
19
+
20
+ The SD-XL Inpainting 0.1 was initialized with the `stable-diffusion-xl-base-1.0` weights. The model is trained for 40k steps at resolution 1024x1024 and 5% dropping of the text-conditioning to improve classifier-free classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and, in 25% mask everything.
21
+
22
+
23
+ ## Usage
24
+
25
+ ```py
26
+ from diffusers import DPMSolverMultistepScheduler
27
+ from optimum.neuron import NeuronStableDiffusionXLInpaintPipeline
28
+ from diffusers.utils import load_image
29
+
30
+ pipe = NeuronStableDiffusionXLInpaintPipeline.from_pretrained("Shekswess/stable-diffusion-xl-1.0-inpainting-0.1-neuron", device_ids=[0, 1])
31
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
32
+
33
+ img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
34
+ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
35
+
36
+ image = load_image(img_url)
37
+ mask_image = load_image(mask_url)
38
+
39
+ prompt = "a tiger sitting on a park bench"
40
+
41
+ image = pipe(
42
+ prompt=prompt,
43
+ image=image,
44
+ mask_image=mask_image,
45
+ guidance_scale=8.0,
46
+ num_inference_steps=20, # steps between 15 and 30 work well for us
47
+ strength=0.99, # make sure to use `strength` below 1.0
48
+ ).images[0].save("output.png")
49
+ ```
50
+
51
+ **How it works:**
52
+ `image` | `mask_image`
53
+ :-------------------------:|:-------------------------:|
54
+ <img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" alt="drawing" width="300"/> | <img src="https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" alt="drawing" width="300"/>
55
+
56
+
57
+ `prompt` | `Output`
58
+ :-------------------------:|:-------------------------:|
59
+ <span style="position: relative;bottom: 150px;">a tiger sitting on a park bench</span> | <img src="https://huggingface.co/datasets/valhalla/images/resolve/main/tiger.png" alt="drawing" width="300"/>
60
+
61
+ ## Model Description
62
+
63
+ - **Developed by:** The Diffusers team
64
+ - **Model type:** Diffusion-based text-to-image generative model
65
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
66
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses two fixed, pretrained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)).
67
+
68
+
69
+ ## Uses
70
+
71
+ ### Direct Use
72
+
73
+ The model is intended for research purposes only. Possible research areas and tasks include
74
+
75
+ - Generation of artworks and use in design and other artistic processes.
76
+ - Applications in educational or creative tools.
77
+ - Research on generative models.
78
+ - Safe deployment of models which have the potential to generate harmful content.
79
+ - Probing and understanding the limitations and biases of generative models.
80
+
81
+ Excluded uses are described below.
82
+
83
+ ### Out-of-Scope Use
84
+
85
+ The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
86
+
87
+ ## Limitations and Bias
88
+
89
+ ### Limitations
90
+
91
+ - The model does not achieve perfect photorealism
92
+ - The model cannot render legible text
93
+ - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
94
+ - Faces and people in general may not be generated properly.
95
+ - The autoencoding part of the model is lossy.
96
+ - When the strength parameter is set to 1 (i.e. starting in-painting from a fully masked image), the quality of the image is degraded. The model retains the non-masked contents of the image, but images look less sharp. We're investing this and working on the next version.
97
+
98
+ ### Bias
99
+ While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
100
+