Spaces:
Running
on
Zero
Running
on
Zero
adamelliotfields
commited on
ControlNet
Browse files- DOCS.md +28 -16
- README.md +5 -8
- app.css +5 -2
- app.py +133 -81
- lib/__init__.py +6 -0
- lib/annotators.py +25 -0
- lib/config.py +15 -2
- lib/inference.py +15 -1
- lib/loader.py +50 -6
- lib/pipelines.py +20 -1
- lib/utils.py +60 -0
- requirements.txt +3 -0
DOCS.md
CHANGED
@@ -1,8 +1,8 @@
|
|
1 |
-
|
2 |
|
3 |
TL;DR: Enter a prompt or roll the `🎲` and press `Generate`.
|
4 |
|
5 |
-
|
6 |
|
7 |
Positive and negative prompts are embedded by [Compel](https://github.com/damian0815/compel) for weighting. See [syntax features](https://github.com/damian0815/compel/blob/main/doc/syntax.md) to learn more.
|
8 |
|
@@ -19,13 +19,13 @@ This is the same syntax used in [InvokeAI](https://invoke-ai.github.io/InvokeAI/
|
|
19 |
| `(blue)1.2` | `(blue:1.2)` |
|
20 |
| `(blue)0.8` | `(blue:0.8)` |
|
21 |
|
22 |
-
|
23 |
|
24 |
Arrays allow you to generate multiple different images from a single prompt. For example, `an adult [[blonde,brunette]] [[man,woman]]` will expand into **4** different prompts. This implementation was inspired by [Fooocus](https://github.com/lllyasviel/Fooocus/pull/1503).
|
25 |
|
26 |
> NB: Make sure to set `Images` to the number of images you want to generate. Otherwise, only the first prompt will be used.
|
27 |
|
28 |
-
|
29 |
|
30 |
Each model checkpoint has a different aesthetic:
|
31 |
|
@@ -38,7 +38,7 @@ Each model checkpoint has a different aesthetic:
|
|
38 |
* [SG161222/Realistic_Vision_V5](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE): realistic
|
39 |
* [XpucT/Deliberate_v6](https://huggingface.co/XpucT/Deliberate): general purpose stylized
|
40 |
|
41 |
-
|
42 |
|
43 |
Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
|
44 |
|
@@ -47,7 +47,7 @@ Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
|
|
47 |
|
48 |
> NB: The trigger words are automatically appended to the positive prompt for you.
|
49 |
|
50 |
-
|
51 |
|
52 |
Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/using-diffusers/textual_inversion_inference) embeddings:
|
53 |
|
@@ -57,13 +57,13 @@ Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/
|
|
57 |
|
58 |
> NB: The trigger token is automatically appended to the negative prompt for you.
|
59 |
|
60 |
-
|
61 |
|
62 |
[Styles](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/data/styles.json) are prompt templates that wrap your positive and negative prompts. They were originally derived from the [twri/sdxl_prompt_styler](https://github.com/twri/sdxl_prompt_styler) Comfy node, but have since been entirely rewritten.
|
63 |
|
64 |
Start by framing a simple subject like `portrait of a young adult woman` or `landscape of a mountain range` and experiment.
|
65 |
|
66 |
-
|
67 |
|
68 |
The `Anime: *` styles work the best with Dreamshaper. When using the anime-specific Anything model, you should use the `Anime: Anything` style with the following settings:
|
69 |
|
@@ -73,13 +73,15 @@ The `Anime: *` styles work the best with Dreamshaper. When using the anime-speci
|
|
73 |
|
74 |
You subject should be a few simple tokens like `girl, brunette, blue eyes, armor, nebula, celestial`. Experiment with `Clip Skip` and `Karras`. Finish with the `Perfection Style` LoRA on a moderate setting and upscale.
|
75 |
|
76 |
-
|
77 |
|
78 |
Rescale up to 4x using [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) with weights from [ai-forever](ai-forever/Real-ESRGAN). Necessary for high-resolution images.
|
79 |
|
80 |
-
|
81 |
|
82 |
-
The `🖼️ Image` tab enables the image-to-image and IP-Adapter pipelines.
|
|
|
|
|
83 |
|
84 |
Denoising strength is essentially how much the generation will differ from the input image. A value of `0` will be identical to the original, while `1` will be a completely new image. You may want to also increase the number of inference steps. Only applies to the image-to-image input.
|
85 |
|
@@ -89,9 +91,19 @@ In an image-to-image pipeline, the input image is used as the initial latent. Wi
|
|
89 |
|
90 |
For capturing faces, enable `IP-Adapter Face` to use the full-face model. You should use an input image that is mostly a face and it should be high quality. You can generate fake portraits with Realistic Vision to experiment. Note that you'll never get true identity preservation without an advanced pipeline like [InstantID](https://github.com/instantX-research/InstantID), which combines many techniques.
|
91 |
|
92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
|
94 |
-
|
95 |
|
96 |
[DeepCache](https://github.com/horseee/DeepCache) caches lower UNet layers and reuses them every `Interval` steps. Trade quality for speed:
|
97 |
* `1`: no caching (default)
|
@@ -99,14 +111,14 @@ For capturing faces, enable `IP-Adapter Face` to use the full-face model. You sh
|
|
99 |
* `3`: balanced
|
100 |
* `4`: more speed
|
101 |
|
102 |
-
|
103 |
|
104 |
[FreeU](https://github.com/ChenyangSi/FreeU) re-weights the contributions sourced from the UNet’s skip connections and backbone feature maps. Can sometimes improve image quality.
|
105 |
|
106 |
-
|
107 |
|
108 |
When enabled, the last CLIP layer is skipped. Can sometimes improve image quality.
|
109 |
|
110 |
-
|
111 |
|
112 |
Enable [madebyollin/taesd](https://github.com/madebyollin/taesd) for near-instant latent decoding with a minor loss in detail. Useful for development.
|
|
|
1 |
+
# Diffusion ZERO
|
2 |
|
3 |
TL;DR: Enter a prompt or roll the `🎲` and press `Generate`.
|
4 |
|
5 |
+
## Prompting
|
6 |
|
7 |
Positive and negative prompts are embedded by [Compel](https://github.com/damian0815/compel) for weighting. See [syntax features](https://github.com/damian0815/compel/blob/main/doc/syntax.md) to learn more.
|
8 |
|
|
|
19 |
| `(blue)1.2` | `(blue:1.2)` |
|
20 |
| `(blue)0.8` | `(blue:0.8)` |
|
21 |
|
22 |
+
### Arrays
|
23 |
|
24 |
Arrays allow you to generate multiple different images from a single prompt. For example, `an adult [[blonde,brunette]] [[man,woman]]` will expand into **4** different prompts. This implementation was inspired by [Fooocus](https://github.com/lllyasviel/Fooocus/pull/1503).
|
25 |
|
26 |
> NB: Make sure to set `Images` to the number of images you want to generate. Otherwise, only the first prompt will be used.
|
27 |
|
28 |
+
## Models
|
29 |
|
30 |
Each model checkpoint has a different aesthetic:
|
31 |
|
|
|
38 |
* [SG161222/Realistic_Vision_V5](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE): realistic
|
39 |
* [XpucT/Deliberate_v6](https://huggingface.co/XpucT/Deliberate): general purpose stylized
|
40 |
|
41 |
+
## LoRA
|
42 |
|
43 |
Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:
|
44 |
|
|
|
47 |
|
48 |
> NB: The trigger words are automatically appended to the positive prompt for you.
|
49 |
|
50 |
+
## Embeddings
|
51 |
|
52 |
Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/using-diffusers/textual_inversion_inference) embeddings:
|
53 |
|
|
|
57 |
|
58 |
> NB: The trigger token is automatically appended to the negative prompt for you.
|
59 |
|
60 |
+
## Styles
|
61 |
|
62 |
[Styles](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/data/styles.json) are prompt templates that wrap your positive and negative prompts. They were originally derived from the [twri/sdxl_prompt_styler](https://github.com/twri/sdxl_prompt_styler) Comfy node, but have since been entirely rewritten.
|
63 |
|
64 |
Start by framing a simple subject like `portrait of a young adult woman` or `landscape of a mountain range` and experiment.
|
65 |
|
66 |
+
### Anime
|
67 |
|
68 |
The `Anime: *` styles work the best with Dreamshaper. When using the anime-specific Anything model, you should use the `Anime: Anything` style with the following settings:
|
69 |
|
|
|
73 |
|
74 |
You subject should be a few simple tokens like `girl, brunette, blue eyes, armor, nebula, celestial`. Experiment with `Clip Skip` and `Karras`. Finish with the `Perfection Style` LoRA on a moderate setting and upscale.
|
75 |
|
76 |
+
## Scale
|
77 |
|
78 |
Rescale up to 4x using [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) with weights from [ai-forever](ai-forever/Real-ESRGAN). Necessary for high-resolution images.
|
79 |
|
80 |
+
## Image-to-Image
|
81 |
|
82 |
+
The `🖼️ Image` tab enables the image-to-image and IP-Adapter pipelines.
|
83 |
+
|
84 |
+
### Strength
|
85 |
|
86 |
Denoising strength is essentially how much the generation will differ from the input image. A value of `0` will be identical to the original, while `1` will be a completely new image. You may want to also increase the number of inference steps. Only applies to the image-to-image input.
|
87 |
|
|
|
91 |
|
92 |
For capturing faces, enable `IP-Adapter Face` to use the full-face model. You should use an input image that is mostly a face and it should be high quality. You can generate fake portraits with Realistic Vision to experiment. Note that you'll never get true identity preservation without an advanced pipeline like [InstantID](https://github.com/instantX-research/InstantID), which combines many techniques.
|
93 |
|
94 |
+
## ControlNet
|
95 |
+
|
96 |
+
The `🎮 Control` tab enables the [ControlNet](https://github.com/lllyasviel/ControlNet) pipelines. Read the [Diffusers docs](https://huggingface.co/docs/diffusers/using-diffusers/controlnet) to learn more.
|
97 |
+
|
98 |
+
### Annotators
|
99 |
+
|
100 |
+
In ControlNet, the input image is a feature map produced by an _annotator_. These are computer vision models used for tasks like edge detection and pose estimation. ControlNet models are trained to understand these feature maps.
|
101 |
+
|
102 |
+
> NB: Control images will be automatically resized to the nearest multiple of 64 (e.g., 513 -> 512).
|
103 |
+
|
104 |
+
## Advanced
|
105 |
|
106 |
+
### DeepCache
|
107 |
|
108 |
[DeepCache](https://github.com/horseee/DeepCache) caches lower UNet layers and reuses them every `Interval` steps. Trade quality for speed:
|
109 |
* `1`: no caching (default)
|
|
|
111 |
* `3`: balanced
|
112 |
* `4`: more speed
|
113 |
|
114 |
+
### FreeU
|
115 |
|
116 |
[FreeU](https://github.com/ChenyangSi/FreeU) re-weights the contributions sourced from the UNet’s skip connections and backbone feature maps. Can sometimes improve image quality.
|
117 |
|
118 |
+
### Clip Skip
|
119 |
|
120 |
When enabled, the last CLIP layer is skipped. Can sometimes improve image quality.
|
121 |
|
122 |
+
### Tiny VAE
|
123 |
|
124 |
Enable [madebyollin/taesd](https://github.com/madebyollin/taesd) for near-instant latent decoding with a minor loss in detail. Useful for development.
|
README.md
CHANGED
@@ -6,7 +6,7 @@ emoji: 🧨
|
|
6 |
colorFrom: purple
|
7 |
colorTo: blue
|
8 |
sdk: gradio
|
9 |
-
sdk_version: 4.
|
10 |
python_version: 3.11.9
|
11 |
app_file: app.py
|
12 |
fullWidth: false
|
@@ -25,9 +25,6 @@ models:
|
|
25 |
- SG161222/Realistic_Vision_V5.1_noVAE
|
26 |
- XpucT/Deliberate
|
27 |
preload_from_hub: # up to 10
|
28 |
-
- >-
|
29 |
-
ai-forever/Real-ESRGAN
|
30 |
-
RealESRGAN_x2.pth,RealESRGAN_x4.pth
|
31 |
- >-
|
32 |
Comfy-Org/stable-diffusion-v1-5-archive
|
33 |
v1-5-pruned-emaonly-fp16.safetensors
|
@@ -43,6 +40,9 @@ preload_from_hub: # up to 10
|
|
43 |
- >-
|
44 |
Linaqruf/anything-v3-1
|
45 |
anything-v3-2.safetensors
|
|
|
|
|
|
|
46 |
- >-
|
47 |
Lykon/dreamshaper-8
|
48 |
feature_extractor/preprocessor_config.json,safety_checker/config.json,scheduler/scheduler_config.json,text_encoder/config.json,text_encoder/model.fp16.safetensors,tokenizer/merges.txt,tokenizer/special_tokens_map.json,tokenizer/tokenizer_config.json,tokenizer/vocab.json,unet/config.json,unet/diffusion_pytorch_model.fp16.safetensors,vae/config.json,vae/diffusion_pytorch_model.fp16.safetensors,model_index.json
|
@@ -62,6 +62,7 @@ preload_from_hub: # up to 10
|
|
62 |
Gradio app for Stable Diffusion 1.5 featuring:
|
63 |
* txt2img and img2img pipelines with IP-Adapter
|
64 |
* Curated models, LoRAs, and TI embeddings
|
|
|
65 |
* Compel prompt weighting
|
66 |
* dozens of styles and starter prompts
|
67 |
* Multiple samplers with Karras scheduling
|
@@ -69,12 +70,8 @@ Gradio app for Stable Diffusion 1.5 featuring:
|
|
69 |
* Real-ESRGAN upscaling
|
70 |
* Optional tiny autoencoder
|
71 |
|
72 |
-
There's also a [CLI](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/cli.py).
|
73 |
-
|
74 |
## Motivation
|
75 |
|
76 |
-
I want to:
|
77 |
-
|
78 |
* host a free and easy-to-use Stable Diffusion UI on ZeroGPU
|
79 |
* provide the necessary tools for common workflows
|
80 |
* curate useful models, adapters, and embeddings
|
|
|
6 |
colorFrom: purple
|
7 |
colorTo: blue
|
8 |
sdk: gradio
|
9 |
+
sdk_version: 4.44.0
|
10 |
python_version: 3.11.9
|
11 |
app_file: app.py
|
12 |
fullWidth: false
|
|
|
25 |
- SG161222/Realistic_Vision_V5.1_noVAE
|
26 |
- XpucT/Deliberate
|
27 |
preload_from_hub: # up to 10
|
|
|
|
|
|
|
28 |
- >-
|
29 |
Comfy-Org/stable-diffusion-v1-5-archive
|
30 |
v1-5-pruned-emaonly-fp16.safetensors
|
|
|
40 |
- >-
|
41 |
Linaqruf/anything-v3-1
|
42 |
anything-v3-2.safetensors
|
43 |
+
- >-
|
44 |
+
lllyasviel/control_v11p_sd15_canny
|
45 |
+
diffusion_pytorch_model.fp16.safetensors
|
46 |
- >-
|
47 |
Lykon/dreamshaper-8
|
48 |
feature_extractor/preprocessor_config.json,safety_checker/config.json,scheduler/scheduler_config.json,text_encoder/config.json,text_encoder/model.fp16.safetensors,tokenizer/merges.txt,tokenizer/special_tokens_map.json,tokenizer/tokenizer_config.json,tokenizer/vocab.json,unet/config.json,unet/diffusion_pytorch_model.fp16.safetensors,vae/config.json,vae/diffusion_pytorch_model.fp16.safetensors,model_index.json
|
|
|
62 |
Gradio app for Stable Diffusion 1.5 featuring:
|
63 |
* txt2img and img2img pipelines with IP-Adapter
|
64 |
* Curated models, LoRAs, and TI embeddings
|
65 |
+
* ControlNet with annotators
|
66 |
* Compel prompt weighting
|
67 |
* dozens of styles and starter prompts
|
68 |
* Multiple samplers with Karras scheduling
|
|
|
70 |
* Real-ESRGAN upscaling
|
71 |
* Optional tiny autoencoder
|
72 |
|
|
|
|
|
73 |
## Motivation
|
74 |
|
|
|
|
|
75 |
* host a free and easy-to-use Stable Diffusion UI on ZeroGPU
|
76 |
* provide the necessary tools for common workflows
|
77 |
* curate useful models, adapters, and embeddings
|
app.css
CHANGED
@@ -30,7 +30,7 @@
|
|
30 |
overflow-y: auto;
|
31 |
}
|
32 |
.gallery, .gallery .grid-wrap {
|
33 |
-
height: calc(100vh -
|
34 |
max-height: none;
|
35 |
}
|
36 |
|
@@ -108,7 +108,10 @@
|
|
108 |
content: 'Random prompt';
|
109 |
}
|
110 |
.popover#clear:hover::after {
|
111 |
-
content: 'Clear
|
|
|
|
|
|
|
112 |
}
|
113 |
.popover#refresh:hover::after {
|
114 |
content: var(--seed, "-1");
|
|
|
30 |
overflow-y: auto;
|
31 |
}
|
32 |
.gallery, .gallery .grid-wrap {
|
33 |
+
height: calc(100vh - 430px);
|
34 |
max-height: none;
|
35 |
}
|
36 |
|
|
|
108 |
content: 'Random prompt';
|
109 |
}
|
110 |
.popover#clear:hover::after {
|
111 |
+
content: 'Clear';
|
112 |
+
}
|
113 |
+
.popover#clear-control:hover::after {
|
114 |
+
content: 'Clear';
|
115 |
}
|
116 |
.popover#refresh:hover::after {
|
117 |
content: var(--seed, "-1");
|
app.py
CHANGED
@@ -6,13 +6,16 @@ import random
|
|
6 |
import gradio as gr
|
7 |
|
8 |
from lib import (
|
|
|
9 |
Config,
|
10 |
async_call,
|
11 |
disable_progress_bars,
|
12 |
download_civit_file,
|
13 |
download_repo_files,
|
14 |
generate,
|
|
|
15 |
read_file,
|
|
|
16 |
)
|
17 |
|
18 |
# the CSS `content` attribute expects a string so we need to wrap the number in quotes
|
@@ -84,6 +87,15 @@ async def random_fn():
|
|
84 |
return gr.Textbox(value=random.choice(prompts))
|
85 |
|
86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
|
88 |
if len(args) > 0:
|
89 |
prompt = args[0]
|
@@ -92,6 +104,7 @@ async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
|
|
92 |
if prompt is None or prompt.strip() == "":
|
93 |
raise gr.Error("You must enter a prompt")
|
94 |
|
|
|
95 |
DISABLE_IMAGE_PROMPT, DISABLE_IP_IMAGE_PROMPT = args[-2:]
|
96 |
gen_args = list(args[:-2])
|
97 |
if DISABLE_IMAGE_PROMPT:
|
@@ -148,25 +161,24 @@ with gr.Blocks(
|
|
148 |
with gr.Tabs():
|
149 |
with gr.TabItem("🏠 Text"):
|
150 |
with gr.Column():
|
151 |
-
|
152 |
-
|
153 |
-
|
154 |
-
|
155 |
-
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
167 |
-
|
168 |
-
|
169 |
-
)
|
170 |
|
171 |
# Buttons
|
172 |
with gr.Row():
|
@@ -196,72 +208,104 @@ with gr.Blocks(
|
|
196 |
|
197 |
# img2img tab
|
198 |
with gr.TabItem("🖼️ Image"):
|
199 |
-
with gr.
|
200 |
-
|
201 |
-
|
202 |
-
|
203 |
-
|
204 |
-
|
205 |
-
|
206 |
-
|
207 |
-
|
208 |
-
|
209 |
-
|
210 |
-
|
211 |
-
|
212 |
-
|
213 |
-
|
214 |
-
)
|
215 |
|
216 |
-
|
217 |
-
|
218 |
-
|
219 |
-
|
220 |
-
|
221 |
-
|
222 |
-
|
223 |
-
|
224 |
-
|
225 |
-
|
226 |
-
|
227 |
-
|
228 |
-
|
229 |
-
|
230 |
-
|
231 |
-
|
232 |
-
|
233 |
|
234 |
-
|
235 |
-
|
236 |
-
|
237 |
-
|
238 |
-
|
239 |
-
|
240 |
-
|
241 |
-
|
242 |
|
243 |
-
|
244 |
-
|
245 |
-
|
246 |
-
|
247 |
-
|
248 |
-
|
249 |
-
|
250 |
-
|
251 |
-
|
252 |
-
|
253 |
-
|
254 |
-
|
255 |
-
|
256 |
-
|
257 |
-
|
258 |
-
|
259 |
|
260 |
-
#
|
261 |
with gr.TabItem("🎮 Control"):
|
262 |
-
gr.
|
263 |
-
|
264 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
265 |
|
266 |
with gr.TabItem("⚙️ Menu"):
|
267 |
with gr.Group():
|
@@ -445,6 +489,12 @@ with gr.Blocks(
|
|
445 |
value=False,
|
446 |
)
|
447 |
|
|
|
|
|
|
|
|
|
|
|
|
|
448 |
random_btn.click(random_fn, inputs=[], outputs=[prompt], show_api=False)
|
449 |
|
450 |
refresh_btn.click(None, inputs=[], outputs=[seed], js=refresh_seed_js)
|
@@ -530,7 +580,7 @@ with gr.Blocks(
|
|
530 |
negative_prompt,
|
531 |
image_prompt,
|
532 |
ip_image_prompt,
|
533 |
-
|
534 |
lora_1,
|
535 |
lora_1_weight,
|
536 |
lora_2,
|
@@ -540,6 +590,7 @@ with gr.Blocks(
|
|
540 |
seed,
|
541 |
model,
|
542 |
scheduler,
|
|
|
543 |
width,
|
544 |
height,
|
545 |
guidance_scale,
|
@@ -552,6 +603,7 @@ with gr.Blocks(
|
|
552 |
use_taesd,
|
553 |
use_freeu,
|
554 |
use_clip_skip,
|
|
|
555 |
DISABLE_IMAGE_PROMPT,
|
556 |
DISABLE_IP_IMAGE_PROMPT,
|
557 |
],
|
|
|
6 |
import gradio as gr
|
7 |
|
8 |
from lib import (
|
9 |
+
CannyAnnotator,
|
10 |
Config,
|
11 |
async_call,
|
12 |
disable_progress_bars,
|
13 |
download_civit_file,
|
14 |
download_repo_files,
|
15 |
generate,
|
16 |
+
get_valid_size,
|
17 |
read_file,
|
18 |
+
resize_image,
|
19 |
)
|
20 |
|
21 |
# the CSS `content` attribute expects a string so we need to wrap the number in quotes
|
|
|
87 |
return gr.Textbox(value=random.choice(prompts))
|
88 |
|
89 |
|
90 |
+
# TODO: move this to another file once more annotators are added; will need @GPU decorator
|
91 |
+
async def annotate_fn(image, annotator):
|
92 |
+
size = get_valid_size(image)
|
93 |
+
image = resize_image(image, size)
|
94 |
+
if annotator == "canny":
|
95 |
+
canny = CannyAnnotator()
|
96 |
+
return canny(image, size)
|
97 |
+
|
98 |
+
|
99 |
async def generate_fn(*args, progress=gr.Progress(track_tqdm=True)):
|
100 |
if len(args) > 0:
|
101 |
prompt = args[0]
|
|
|
104 |
if prompt is None or prompt.strip() == "":
|
105 |
raise gr.Error("You must enter a prompt")
|
106 |
|
107 |
+
# always the last arguments
|
108 |
DISABLE_IMAGE_PROMPT, DISABLE_IP_IMAGE_PROMPT = args[-2:]
|
109 |
gen_args = list(args[:-2])
|
110 |
if DISABLE_IMAGE_PROMPT:
|
|
|
161 |
with gr.Tabs():
|
162 |
with gr.TabItem("🏠 Text"):
|
163 |
with gr.Column():
|
164 |
+
output_images = gr.Gallery(
|
165 |
+
elem_classes=["gallery"],
|
166 |
+
show_share_button=False,
|
167 |
+
object_fit="cover",
|
168 |
+
interactive=False,
|
169 |
+
show_label=False,
|
170 |
+
label="Output",
|
171 |
+
format="png",
|
172 |
+
columns=2,
|
173 |
+
)
|
174 |
+
prompt = gr.Textbox(
|
175 |
+
placeholder="What do you want to see?",
|
176 |
+
autoscroll=False,
|
177 |
+
show_label=False,
|
178 |
+
label="Prompt",
|
179 |
+
max_lines=3,
|
180 |
+
lines=3,
|
181 |
+
)
|
|
|
182 |
|
183 |
# Buttons
|
184 |
with gr.Row():
|
|
|
208 |
|
209 |
# img2img tab
|
210 |
with gr.TabItem("🖼️ Image"):
|
211 |
+
with gr.Row():
|
212 |
+
image_prompt = gr.Image(
|
213 |
+
show_share_button=False,
|
214 |
+
label="Initial Image",
|
215 |
+
min_width=320,
|
216 |
+
format="png",
|
217 |
+
type="pil",
|
218 |
+
)
|
219 |
+
ip_image_prompt = gr.Image(
|
220 |
+
show_share_button=False,
|
221 |
+
label="IP-Adapter Image",
|
222 |
+
min_width=320,
|
223 |
+
format="png",
|
224 |
+
type="pil",
|
225 |
+
)
|
|
|
226 |
|
227 |
+
with gr.Row():
|
228 |
+
image_select = gr.Dropdown(
|
229 |
+
info="Use an initial image from the gallery",
|
230 |
+
choices=[("None", -1)],
|
231 |
+
label="Gallery Image",
|
232 |
+
interactive=True,
|
233 |
+
filterable=False,
|
234 |
+
value=-1,
|
235 |
+
)
|
236 |
+
ip_image_select = gr.Dropdown(
|
237 |
+
info="Use an IP-Adapter image from the gallery",
|
238 |
+
label="Gallery Image",
|
239 |
+
choices=[("None", -1)],
|
240 |
+
interactive=True,
|
241 |
+
filterable=False,
|
242 |
+
value=-1,
|
243 |
+
)
|
244 |
|
245 |
+
with gr.Row():
|
246 |
+
denoising_strength = gr.Slider(
|
247 |
+
value=Config.DENOISING_STRENGTH,
|
248 |
+
label="Denoising Strength",
|
249 |
+
minimum=0.0,
|
250 |
+
maximum=1.0,
|
251 |
+
step=0.1,
|
252 |
+
)
|
253 |
|
254 |
+
with gr.Row():
|
255 |
+
disable_image = gr.Checkbox(
|
256 |
+
elem_classes=["checkbox"],
|
257 |
+
label="Disable Initial Image",
|
258 |
+
value=False,
|
259 |
+
)
|
260 |
+
disable_ip_image = gr.Checkbox(
|
261 |
+
elem_classes=["checkbox"],
|
262 |
+
label="Disable IP-Adapter Image",
|
263 |
+
value=False,
|
264 |
+
)
|
265 |
+
use_ip_face = gr.Checkbox(
|
266 |
+
elem_classes=["checkbox"],
|
267 |
+
label="Use IP-Adapter Face",
|
268 |
+
value=False,
|
269 |
+
)
|
270 |
|
271 |
+
# controlnet tab
|
272 |
with gr.TabItem("🎮 Control"):
|
273 |
+
with gr.Row():
|
274 |
+
control_image_input = gr.Image(
|
275 |
+
show_share_button=False,
|
276 |
+
label="Control Image",
|
277 |
+
min_width=320,
|
278 |
+
format="png",
|
279 |
+
type="pil",
|
280 |
+
)
|
281 |
+
control_image_prompt = gr.Image(
|
282 |
+
interactive=False,
|
283 |
+
show_share_button=False,
|
284 |
+
label="Control Image Output",
|
285 |
+
show_label=False,
|
286 |
+
min_width=320,
|
287 |
+
format="png",
|
288 |
+
type="pil",
|
289 |
+
)
|
290 |
+
|
291 |
+
with gr.Row():
|
292 |
+
control_annotator = gr.Dropdown(
|
293 |
+
choices=[("Canny", "canny")],
|
294 |
+
label="Annotator",
|
295 |
+
filterable=False,
|
296 |
+
value="canny",
|
297 |
+
)
|
298 |
+
|
299 |
+
with gr.Row():
|
300 |
+
annotate_btn = gr.Button("Annotate", variant="primary")
|
301 |
+
clear_control_btn = gr.ClearButton(
|
302 |
+
elem_classes=["icon-button", "popover"],
|
303 |
+
components=[control_image_prompt],
|
304 |
+
variant="secondary",
|
305 |
+
elem_id="clear-control",
|
306 |
+
min_width=0,
|
307 |
+
value="🗑️",
|
308 |
+
)
|
309 |
|
310 |
with gr.TabItem("⚙️ Menu"):
|
311 |
with gr.Group():
|
|
|
489 |
value=False,
|
490 |
)
|
491 |
|
492 |
+
annotate_btn.click(
|
493 |
+
annotate_fn,
|
494 |
+
inputs=[control_image_input, control_annotator],
|
495 |
+
outputs=[control_image_prompt],
|
496 |
+
)
|
497 |
+
|
498 |
random_btn.click(random_fn, inputs=[], outputs=[prompt], show_api=False)
|
499 |
|
500 |
refresh_btn.click(None, inputs=[], outputs=[seed], js=refresh_seed_js)
|
|
|
580 |
negative_prompt,
|
581 |
image_prompt,
|
582 |
ip_image_prompt,
|
583 |
+
control_image_prompt,
|
584 |
lora_1,
|
585 |
lora_1_weight,
|
586 |
lora_2,
|
|
|
590 |
seed,
|
591 |
model,
|
592 |
scheduler,
|
593 |
+
control_annotator,
|
594 |
width,
|
595 |
height,
|
596 |
guidance_scale,
|
|
|
603 |
use_taesd,
|
604 |
use_freeu,
|
605 |
use_clip_skip,
|
606 |
+
use_ip_face,
|
607 |
DISABLE_IMAGE_PROMPT,
|
608 |
DISABLE_IP_IMAGE_PROMPT,
|
609 |
],
|
lib/__init__.py
CHANGED
@@ -1,3 +1,4 @@
|
|
|
|
1 |
from .config import Config
|
2 |
from .inference import generate
|
3 |
from .loader import Loader
|
@@ -9,13 +10,16 @@ from .utils import (
|
|
9 |
download_civit_file,
|
10 |
download_repo_files,
|
11 |
enable_progress_bars,
|
|
|
12 |
load_json,
|
13 |
read_file,
|
|
|
14 |
safe_progress,
|
15 |
timer,
|
16 |
)
|
17 |
|
18 |
__all__ = [
|
|
|
19 |
"Config",
|
20 |
"Loader",
|
21 |
"Logger",
|
@@ -26,8 +30,10 @@ __all__ = [
|
|
26 |
"download_repo_files",
|
27 |
"enable_progress_bars",
|
28 |
"generate",
|
|
|
29 |
"load_json",
|
30 |
"read_file",
|
|
|
31 |
"safe_progress",
|
32 |
"timer",
|
33 |
]
|
|
|
1 |
+
from .annotators import CannyAnnotator
|
2 |
from .config import Config
|
3 |
from .inference import generate
|
4 |
from .loader import Loader
|
|
|
10 |
download_civit_file,
|
11 |
download_repo_files,
|
12 |
enable_progress_bars,
|
13 |
+
get_valid_size,
|
14 |
load_json,
|
15 |
read_file,
|
16 |
+
resize_image,
|
17 |
safe_progress,
|
18 |
timer,
|
19 |
)
|
20 |
|
21 |
__all__ = [
|
22 |
+
"CannyAnnotator",
|
23 |
"Config",
|
24 |
"Loader",
|
25 |
"Logger",
|
|
|
30 |
"download_repo_files",
|
31 |
"enable_progress_bars",
|
32 |
"generate",
|
33 |
+
"get_valid_size",
|
34 |
"load_json",
|
35 |
"read_file",
|
36 |
+
"resize_image",
|
37 |
"safe_progress",
|
38 |
"timer",
|
39 |
]
|
lib/annotators.py
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from threading import Lock
|
2 |
+
|
3 |
+
from controlnet_aux import CannyDetector
|
4 |
+
|
5 |
+
|
6 |
+
class CannyAnnotator:
|
7 |
+
_instance = None
|
8 |
+
_lock = Lock()
|
9 |
+
|
10 |
+
def __new__(cls):
|
11 |
+
with cls._lock:
|
12 |
+
if cls._instance is None:
|
13 |
+
cls._instance = super().__new__(cls)
|
14 |
+
cls._instance.model = CannyDetector()
|
15 |
+
return cls._instance
|
16 |
+
|
17 |
+
def __call__(self, img, size):
|
18 |
+
resolution = min(*size)
|
19 |
+
return self.model(
|
20 |
+
img,
|
21 |
+
low_threshold=100,
|
22 |
+
high_threshold=200,
|
23 |
+
detect_resolution=resolution,
|
24 |
+
image_resolution=resolution,
|
25 |
+
)
|
lib/config.py
CHANGED
@@ -16,7 +16,12 @@ from diffusers import (
|
|
16 |
from diffusers.utils import logging as diffusers_logging
|
17 |
from transformers import logging as transformers_logging
|
18 |
|
19 |
-
from .pipelines import
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
# improved GPU handling and progress bars; set before importing spaces
|
22 |
os.environ["ZEROGPU_V2"] = "1"
|
@@ -53,11 +58,14 @@ Config = SimpleNamespace(
|
|
53 |
ZERO_GPU=import_module("spaces").config.Config.zero_gpu,
|
54 |
HF_MODELS={
|
55 |
# downloaded on startup
|
56 |
-
"
|
57 |
"Comfy-Org/stable-diffusion-v1-5-archive": ["v1-5-pruned-emaonly-fp16.safetensors"],
|
58 |
"cyberdelia/CyberRealistic": ["CyberRealistic_V5_FP16.safetensors"],
|
59 |
"fluently/Fluently-v4": ["Fluently-v4.safetensors"],
|
60 |
"Linaqruf/anything-v3-1": ["anything-v3-2.safetensors"],
|
|
|
|
|
|
|
61 |
"prompthero/openjourney-v4": ["openjourney-v4.ckpt"],
|
62 |
"SG161222/Realistic_Vision_V5.1_noVAE": ["Realistic_Vision_V5.1_fp16-no-ema.safetensors"],
|
63 |
"XpucT/Deliberate": ["Deliberate_v6.safetensors"],
|
@@ -89,6 +97,8 @@ Config = SimpleNamespace(
|
|
89 |
PIPELINES={
|
90 |
"txt2img": CustomStableDiffusionPipeline,
|
91 |
"img2img": CustomStableDiffusionImg2ImgPipeline,
|
|
|
|
|
92 |
},
|
93 |
MODEL="Lykon/dreamshaper-8",
|
94 |
MODELS=[
|
@@ -121,6 +131,9 @@ Config = SimpleNamespace(
|
|
121 |
"PNDM": PNDMScheduler,
|
122 |
"UniPC 2M": UniPCMultistepScheduler,
|
123 |
},
|
|
|
|
|
|
|
124 |
EMBEDDING="fast_negative",
|
125 |
EMBEDDINGS=[
|
126 |
"cyberrealistic_negative",
|
|
|
16 |
from diffusers.utils import logging as diffusers_logging
|
17 |
from transformers import logging as transformers_logging
|
18 |
|
19 |
+
from .pipelines import (
|
20 |
+
CustomStableDiffusionControlNetImg2ImgPipeline,
|
21 |
+
CustomStableDiffusionControlNetPipeline,
|
22 |
+
CustomStableDiffusionImg2ImgPipeline,
|
23 |
+
CustomStableDiffusionPipeline,
|
24 |
+
)
|
25 |
|
26 |
# improved GPU handling and progress bars; set before importing spaces
|
27 |
os.environ["ZEROGPU_V2"] = "1"
|
|
|
58 |
ZERO_GPU=import_module("spaces").config.Config.zero_gpu,
|
59 |
HF_MODELS={
|
60 |
# downloaded on startup
|
61 |
+
"ai-forever/Real-ESRGAN": ["RealESRGAN_x2.pth", "RealESRGAN_x4.pth"],
|
62 |
"Comfy-Org/stable-diffusion-v1-5-archive": ["v1-5-pruned-emaonly-fp16.safetensors"],
|
63 |
"cyberdelia/CyberRealistic": ["CyberRealistic_V5_FP16.safetensors"],
|
64 |
"fluently/Fluently-v4": ["Fluently-v4.safetensors"],
|
65 |
"Linaqruf/anything-v3-1": ["anything-v3-2.safetensors"],
|
66 |
+
"lllyasviel/control_v11p_sd15_canny": ["diffusion_pytorch_model.fp16.safetensors"],
|
67 |
+
"Lykon/dreamshaper-8": [*_sd_files],
|
68 |
+
"madebyollin/taesd": ["diffusion_pytorch_model.safetensors"],
|
69 |
"prompthero/openjourney-v4": ["openjourney-v4.ckpt"],
|
70 |
"SG161222/Realistic_Vision_V5.1_noVAE": ["Realistic_Vision_V5.1_fp16-no-ema.safetensors"],
|
71 |
"XpucT/Deliberate": ["Deliberate_v6.safetensors"],
|
|
|
97 |
PIPELINES={
|
98 |
"txt2img": CustomStableDiffusionPipeline,
|
99 |
"img2img": CustomStableDiffusionImg2ImgPipeline,
|
100 |
+
"controlnet_txt2img": CustomStableDiffusionControlNetPipeline,
|
101 |
+
"controlnet_img2img": CustomStableDiffusionControlNetImg2ImgPipeline,
|
102 |
},
|
103 |
MODEL="Lykon/dreamshaper-8",
|
104 |
MODELS=[
|
|
|
131 |
"PNDM": PNDMScheduler,
|
132 |
"UniPC 2M": UniPCMultistepScheduler,
|
133 |
},
|
134 |
+
ANNOTATORS={
|
135 |
+
"canny": "lllyasviel/control_v11p_sd15_canny",
|
136 |
+
},
|
137 |
EMBEDDING="fast_negative",
|
138 |
EMBEDDINGS=[
|
139 |
"cyberrealistic_negative",
|
lib/inference.py
CHANGED
@@ -98,7 +98,7 @@ def generate(
|
|
98 |
negative_prompt="",
|
99 |
image_prompt=None,
|
100 |
ip_image_prompt=None,
|
101 |
-
|
102 |
lora_1=None,
|
103 |
lora_1_weight=0.0,
|
104 |
lora_2=None,
|
@@ -108,6 +108,7 @@ def generate(
|
|
108 |
seed=None,
|
109 |
model="Lykon/dreamshaper-8",
|
110 |
scheduler="DDIM",
|
|
|
111 |
width=512,
|
112 |
height=512,
|
113 |
guidance_scale=7.5,
|
@@ -120,6 +121,7 @@ def generate(
|
|
120 |
taesd=False,
|
121 |
freeu=False,
|
122 |
clip_skip=False,
|
|
|
123 |
Error=Exception,
|
124 |
Info=None,
|
125 |
progress=None,
|
@@ -142,6 +144,10 @@ def generate(
|
|
142 |
CURRENT_IMAGE = 1
|
143 |
|
144 |
KIND = "img2img" if image_prompt is not None else "txt2img"
|
|
|
|
|
|
|
|
|
145 |
|
146 |
EMBEDDINGS_TYPE = (
|
147 |
ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED
|
@@ -174,6 +180,7 @@ def generate(
|
|
174 |
IP_ADAPTER,
|
175 |
model,
|
176 |
scheduler,
|
|
|
177 |
deepcache,
|
178 |
scale,
|
179 |
karras,
|
@@ -293,6 +300,13 @@ def generate(
|
|
293 |
kwargs["strength"] = denoising_strength
|
294 |
kwargs["image"] = prepare_image(image_prompt, (width, height))
|
295 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
296 |
if IP_ADAPTER:
|
297 |
# don't resize full-face images since they are usually square crops
|
298 |
size = None if ip_face else (width, height)
|
|
|
98 |
negative_prompt="",
|
99 |
image_prompt=None,
|
100 |
ip_image_prompt=None,
|
101 |
+
control_image_prompt=None,
|
102 |
lora_1=None,
|
103 |
lora_1_weight=0.0,
|
104 |
lora_2=None,
|
|
|
108 |
seed=None,
|
109 |
model="Lykon/dreamshaper-8",
|
110 |
scheduler="DDIM",
|
111 |
+
annotator="canny",
|
112 |
width=512,
|
113 |
height=512,
|
114 |
guidance_scale=7.5,
|
|
|
121 |
taesd=False,
|
122 |
freeu=False,
|
123 |
clip_skip=False,
|
124 |
+
ip_face=False,
|
125 |
Error=Exception,
|
126 |
Info=None,
|
127 |
progress=None,
|
|
|
144 |
CURRENT_IMAGE = 1
|
145 |
|
146 |
KIND = "img2img" if image_prompt is not None else "txt2img"
|
147 |
+
KIND = f"controlnet_{KIND}" if control_image_prompt is not None else KIND
|
148 |
+
|
149 |
+
if KIND.startswith("controlnet_") and annotator.lower() not in Config.ANNOTATORS.keys():
|
150 |
+
raise Error(f"Invalid annotator: {annotator}")
|
151 |
|
152 |
EMBEDDINGS_TYPE = (
|
153 |
ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED
|
|
|
180 |
IP_ADAPTER,
|
181 |
model,
|
182 |
scheduler,
|
183 |
+
annotator,
|
184 |
deepcache,
|
185 |
scale,
|
186 |
karras,
|
|
|
300 |
kwargs["strength"] = denoising_strength
|
301 |
kwargs["image"] = prepare_image(image_prompt, (width, height))
|
302 |
|
303 |
+
if KIND == "controlnet_txt2img":
|
304 |
+
# don't resize controlnet images
|
305 |
+
kwargs["image"] = prepare_image(control_image_prompt, None)
|
306 |
+
|
307 |
+
if KIND == "controlnet_img2img":
|
308 |
+
kwargs["control_image"] = prepare_image(control_image_prompt, None)
|
309 |
+
|
310 |
if IP_ADAPTER:
|
311 |
# don't resize full-face images since they are usually square crops
|
312 |
size = None if ip_face else (width, height)
|
lib/loader.py
CHANGED
@@ -3,6 +3,7 @@ from threading import Lock
|
|
3 |
|
4 |
import torch
|
5 |
from DeepCache import DeepCacheSDHelper
|
|
|
6 |
from diffusers.models import AutoencoderKL, AutoencoderTiny
|
7 |
from diffusers.models.attention_processor import AttnProcessor2_0, IPAdapterAttnProcessor2_0
|
8 |
|
@@ -23,6 +24,7 @@ class Loader:
|
|
23 |
cls._instance.pipe = None
|
24 |
cls._instance.model = None
|
25 |
cls._instance.upscaler = None
|
|
|
26 |
cls._instance.ip_adapter = None
|
27 |
cls._instance.log = Logger("Loader")
|
28 |
return cls._instance
|
@@ -75,15 +77,36 @@ class Loader:
|
|
75 |
return True
|
76 |
return False
|
77 |
|
78 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
if self.pipe is None:
|
80 |
return False
|
81 |
if self.model.lower() != model.lower():
|
82 |
return True
|
83 |
if kind == "txt2img" and not isinstance(self.pipe, Config.PIPELINES["txt2img"]):
|
84 |
-
return True
|
85 |
if kind == "img2img" and not isinstance(self.pipe, Config.PIPELINES["img2img"]):
|
86 |
-
return True
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
return False
|
88 |
|
89 |
def _unload_upscaler(self):
|
@@ -128,7 +151,16 @@ class Loader:
|
|
128 |
with timer(f"Unloading {self.model}", logger=self.log.info):
|
129 |
self.pipe.to("cpu")
|
130 |
|
131 |
-
def _unload(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
132 |
to_unload = []
|
133 |
if self._should_unload_deepcache(deepcache): # remove deepcache first
|
134 |
self._unload_deepcache()
|
@@ -144,7 +176,10 @@ class Loader:
|
|
144 |
self._unload_ip_adapter()
|
145 |
to_unload.append("ip_adapter")
|
146 |
|
147 |
-
if self.
|
|
|
|
|
|
|
148 |
self._unload_pipeline()
|
149 |
to_unload.append("model")
|
150 |
to_unload.append("pipe")
|
@@ -288,6 +323,7 @@ class Loader:
|
|
288 |
ip_adapter,
|
289 |
model,
|
290 |
scheduler,
|
|
|
291 |
deepcache,
|
292 |
scale,
|
293 |
karras,
|
@@ -336,7 +372,15 @@ class Loader:
|
|
336 |
# defaults to float32
|
337 |
pipe_kwargs["torch_dtype"] = torch.float16
|
338 |
|
339 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
340 |
self._load_pipeline(kind, model, progress, **pipe_kwargs)
|
341 |
|
342 |
# error loading model
|
|
|
3 |
|
4 |
import torch
|
5 |
from DeepCache import DeepCacheSDHelper
|
6 |
+
from diffusers import ControlNetModel
|
7 |
from diffusers.models import AutoencoderKL, AutoencoderTiny
|
8 |
from diffusers.models.attention_processor import AttnProcessor2_0, IPAdapterAttnProcessor2_0
|
9 |
|
|
|
24 |
cls._instance.pipe = None
|
25 |
cls._instance.model = None
|
26 |
cls._instance.upscaler = None
|
27 |
+
cls._instance.controlnet = None
|
28 |
cls._instance.ip_adapter = None
|
29 |
cls._instance.log = Logger("Loader")
|
30 |
return cls._instance
|
|
|
77 |
return True
|
78 |
return False
|
79 |
|
80 |
+
def _should_unload_controlnet(self, kind="", controlnet=""):
|
81 |
+
if self.controlnet is None:
|
82 |
+
return False
|
83 |
+
if self.controlnet.lower() != controlnet.lower():
|
84 |
+
return True
|
85 |
+
if not kind.startswith("controlnet_"):
|
86 |
+
return True
|
87 |
+
return False
|
88 |
+
|
89 |
+
def _should_unload_pipeline(self, kind="", model="", controlnet=""):
|
90 |
if self.pipe is None:
|
91 |
return False
|
92 |
if self.model.lower() != model.lower():
|
93 |
return True
|
94 |
if kind == "txt2img" and not isinstance(self.pipe, Config.PIPELINES["txt2img"]):
|
95 |
+
return True
|
96 |
if kind == "img2img" and not isinstance(self.pipe, Config.PIPELINES["img2img"]):
|
97 |
+
return True
|
98 |
+
if kind == "controlnet_txt2img" and not isinstance(
|
99 |
+
self.pipe,
|
100 |
+
Config.PIPELINES["controlnet_txt2img"],
|
101 |
+
):
|
102 |
+
return True
|
103 |
+
if kind == "controlnet_img2img" and not isinstance(
|
104 |
+
self.pipe,
|
105 |
+
Config.PIPELINES["controlnet_img2img"],
|
106 |
+
):
|
107 |
+
return True
|
108 |
+
if self._should_unload_controlnet(kind, controlnet):
|
109 |
+
return True
|
110 |
return False
|
111 |
|
112 |
def _unload_upscaler(self):
|
|
|
151 |
with timer(f"Unloading {self.model}", logger=self.log.info):
|
152 |
self.pipe.to("cpu")
|
153 |
|
154 |
+
def _unload(
|
155 |
+
self,
|
156 |
+
kind="",
|
157 |
+
model="",
|
158 |
+
controlnet="",
|
159 |
+
ip_adapter="",
|
160 |
+
deepcache=1,
|
161 |
+
scale=1,
|
162 |
+
freeu=False,
|
163 |
+
):
|
164 |
to_unload = []
|
165 |
if self._should_unload_deepcache(deepcache): # remove deepcache first
|
166 |
self._unload_deepcache()
|
|
|
176 |
self._unload_ip_adapter()
|
177 |
to_unload.append("ip_adapter")
|
178 |
|
179 |
+
if self._should_unload_controlnet(kind, controlnet):
|
180 |
+
to_unload.append("controlnet")
|
181 |
+
|
182 |
+
if self._should_unload_pipeline(kind, model, controlnet):
|
183 |
self._unload_pipeline()
|
184 |
to_unload.append("model")
|
185 |
to_unload.append("pipe")
|
|
|
323 |
ip_adapter,
|
324 |
model,
|
325 |
scheduler,
|
326 |
+
annotator,
|
327 |
deepcache,
|
328 |
scale,
|
329 |
karras,
|
|
|
372 |
# defaults to float32
|
373 |
pipe_kwargs["torch_dtype"] = torch.float16
|
374 |
|
375 |
+
if kind.startswith("controlnet_"):
|
376 |
+
pipe_kwargs["controlnet"] = ControlNetModel.from_pretrained(
|
377 |
+
Config.ANNOTATORS[annotator],
|
378 |
+
torch_dtype=torch.float16,
|
379 |
+
variant="fp16",
|
380 |
+
)
|
381 |
+
self.controlnet = annotator
|
382 |
+
|
383 |
+
self._unload(kind, model, annotator, ip_adapter, deepcache, scale, freeu)
|
384 |
self._load_pipeline(kind, model, progress, **pipe_kwargs)
|
385 |
|
386 |
# error loading model
|
lib/pipelines.py
CHANGED
@@ -1,7 +1,12 @@
|
|
1 |
import os
|
2 |
from importlib import import_module
|
3 |
|
4 |
-
from diffusers import
|
|
|
|
|
|
|
|
|
|
|
5 |
from diffusers.loaders.single_file import (
|
6 |
SINGLE_FILE_OPTIONAL_COMPONENTS,
|
7 |
load_single_file_sub_model,
|
@@ -220,3 +225,17 @@ class CustomStableDiffusionPipeline(CustomDiffusionMixin, StableDiffusionPipelin
|
|
220 |
|
221 |
class CustomStableDiffusionImg2ImgPipeline(CustomDiffusionMixin, StableDiffusionImg2ImgPipeline):
|
222 |
pass
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
import os
|
2 |
from importlib import import_module
|
3 |
|
4 |
+
from diffusers import (
|
5 |
+
StableDiffusionControlNetImg2ImgPipeline,
|
6 |
+
StableDiffusionControlNetPipeline,
|
7 |
+
StableDiffusionImg2ImgPipeline,
|
8 |
+
StableDiffusionPipeline,
|
9 |
+
)
|
10 |
from diffusers.loaders.single_file import (
|
11 |
SINGLE_FILE_OPTIONAL_COMPONENTS,
|
12 |
load_single_file_sub_model,
|
|
|
225 |
|
226 |
class CustomStableDiffusionImg2ImgPipeline(CustomDiffusionMixin, StableDiffusionImg2ImgPipeline):
|
227 |
pass
|
228 |
+
|
229 |
+
|
230 |
+
class CustomStableDiffusionControlNetPipeline(
|
231 |
+
CustomDiffusionMixin,
|
232 |
+
StableDiffusionControlNetPipeline,
|
233 |
+
):
|
234 |
+
pass
|
235 |
+
|
236 |
+
|
237 |
+
class CustomStableDiffusionControlNetImg2ImgPipeline(
|
238 |
+
CustomDiffusionMixin,
|
239 |
+
StableDiffusionControlNetImg2ImgPipeline,
|
240 |
+
):
|
241 |
+
pass
|
lib/utils.py
CHANGED
@@ -7,11 +7,14 @@ from contextlib import contextmanager
|
|
7 |
from typing import Callable, TypeVar
|
8 |
|
9 |
import anyio
|
|
|
10 |
import httpx
|
|
|
11 |
from anyio import Semaphore
|
12 |
from diffusers.utils import logging as diffusers_logging
|
13 |
from huggingface_hub._snapshot_download import snapshot_download
|
14 |
from huggingface_hub.utils import are_progress_bars_disabled
|
|
|
15 |
from transformers import logging as transformers_logging
|
16 |
from typing_extensions import ParamSpec
|
17 |
|
@@ -107,6 +110,63 @@ def download_civit_file(lora_id, version_id, file_path=".", token=None):
|
|
107 |
log.error(f"RequestError: {e}")
|
108 |
|
109 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
# like the original but supports args and kwargs instead of a dict
|
111 |
# https://github.com/huggingface/huggingface-inference-toolkit/blob/0.2.0/src/huggingface_inference_toolkit/async_utils.py
|
112 |
async def async_call(fn: Callable[P, T], *args: P.args, **kwargs: P.kwargs) -> T:
|
|
|
7 |
from typing import Callable, TypeVar
|
8 |
|
9 |
import anyio
|
10 |
+
import cv2
|
11 |
import httpx
|
12 |
+
import numpy as np
|
13 |
from anyio import Semaphore
|
14 |
from diffusers.utils import logging as diffusers_logging
|
15 |
from huggingface_hub._snapshot_download import snapshot_download
|
16 |
from huggingface_hub.utils import are_progress_bars_disabled
|
17 |
+
from PIL import Image
|
18 |
from transformers import logging as transformers_logging
|
19 |
from typing_extensions import ParamSpec
|
20 |
|
|
|
110 |
log.error(f"RequestError: {e}")
|
111 |
|
112 |
|
113 |
+
# resize an image while preserving the aspect ratio (size is width-first)
|
114 |
+
def resize_image(image, size):
|
115 |
+
if isinstance(image, Image.Image):
|
116 |
+
image = np.array(image)
|
117 |
+
|
118 |
+
H, W, _ = image.shape
|
119 |
+
W = float(W)
|
120 |
+
H = float(H)
|
121 |
+
target_W, target_H = size
|
122 |
+
|
123 |
+
# Use the smaller scaling factor to maintain the aspect ratio.
|
124 |
+
k_w = float(target_W) / W
|
125 |
+
k_h = float(target_H) / H
|
126 |
+
k = min(k_w, k_h)
|
127 |
+
|
128 |
+
new_W = int(np.round(W * k / 64.0)) * 64
|
129 |
+
new_H = int(np.round(H * k / 64.0)) * 64
|
130 |
+
img = cv2.resize(
|
131 |
+
image,
|
132 |
+
(new_W, new_H),
|
133 |
+
interpolation=cv2.INTER_LANCZOS4 if k > 1 else cv2.INTER_AREA,
|
134 |
+
)
|
135 |
+
return img
|
136 |
+
|
137 |
+
|
138 |
+
# ensure image is within bounds
|
139 |
+
def get_valid_size(image, step=64, low=512, high=4096):
|
140 |
+
def round_down(x, step=step):
|
141 |
+
return int((x // step) * step)
|
142 |
+
|
143 |
+
def clamp_range(x, low=low, high=high):
|
144 |
+
return max(low, min(x, high))
|
145 |
+
|
146 |
+
if isinstance(image, Image.Image):
|
147 |
+
image = np.array(image)
|
148 |
+
|
149 |
+
H, W = image.shape[:2]
|
150 |
+
ar = W / H
|
151 |
+
|
152 |
+
# try width first
|
153 |
+
if W > H:
|
154 |
+
new_W = round_down(clamp_range(W))
|
155 |
+
new_H = round_down(new_W / ar)
|
156 |
+
else:
|
157 |
+
new_H = round_down(clamp_range(H))
|
158 |
+
new_W = round_down(new_H * ar)
|
159 |
+
|
160 |
+
# if the new size is out of bounds, try the other dimension
|
161 |
+
if new_W < low or new_W > high:
|
162 |
+
new_W = round_down(clamp_range(W))
|
163 |
+
new_H = round_down(new_W / ar)
|
164 |
+
if new_H < low or new_H > high:
|
165 |
+
new_H = round_down(clamp_range(H))
|
166 |
+
new_W = round_down(new_H * ar)
|
167 |
+
return (new_W, new_H)
|
168 |
+
|
169 |
+
|
170 |
# like the original but supports args and kwargs instead of a dict
|
171 |
# https://github.com/huggingface/huggingface-inference-toolkit/blob/0.2.0/src/huggingface_inference_toolkit/async_utils.py
|
172 |
async def async_call(fn: Callable[P, T], *args: P.args, **kwargs: P.kwargs) -> T:
|
requirements.txt
CHANGED
@@ -1,5 +1,6 @@
|
|
1 |
anyio==4.6.0
|
2 |
compel==2.0.3
|
|
|
3 |
deepcache==0.1.1
|
4 |
diffusers==0.30.3
|
5 |
einops==0.8.0
|
@@ -7,7 +8,9 @@ gradio==4.44.0
|
|
7 |
h2
|
8 |
hf-transfer
|
9 |
httpx
|
|
|
10 |
numpy==1.26.4
|
|
|
11 |
peft
|
12 |
ruff==0.6.7
|
13 |
spaces==0.30.2
|
|
|
1 |
anyio==4.6.0
|
2 |
compel==2.0.3
|
3 |
+
controlnet-aux==0.0.9
|
4 |
deepcache==0.1.1
|
5 |
diffusers==0.30.3
|
6 |
einops==0.8.0
|
|
|
8 |
h2
|
9 |
hf-transfer
|
10 |
httpx
|
11 |
+
mediapipe
|
12 |
numpy==1.26.4
|
13 |
+
opencv-contrib-python
|
14 |
peft
|
15 |
ruff==0.6.7
|
16 |
spaces==0.30.2
|