Image-to-Image
Diffusers
Safetensors
English
controlnet
laion
face
mediapipe
README.md CHANGED
@@ -1,19 +1,16 @@
1
  ---
2
- language:
3
- - en
4
- thumbnail: ''
5
  tags:
6
  - controlnet
7
  - laion
8
  - face
9
  - mediapipe
10
- - image-to-image
11
- license: openrail
12
- base_model: stabilityai/stable-diffusion-2-1-base
13
  datasets:
14
  - LAION-Face
15
  - LAION
16
- pipeline_tag: image-to-image
17
  ---
18
 
19
  # ControlNet LAION Face Dataset
@@ -107,58 +104,12 @@ python ./train_laion_face_sd15.py
107
  We have provided `gradio_face2image.py`. Update the following two lines to point them to your trained model.
108
 
109
  ```
110
- model = create_model('./models/cldm_v21.yaml').cpu() # If you fine-tune on SD2.1 base, this does not need to change.
111
  model.load_state_dict(load_state_dict('./models/control_sd21_openpose.pth', location='cuda'))
112
  ```
113
 
114
  The model has some limitations: while it is empirically better at tracking gaze and mouth poses than previous attempts, it may still ignore controls. Adding details to the prompt like, "looking right" can abate bad behavior.
115
 
116
- ## 🧨 Diffusers
117
-
118
- It is recommended to use the checkpoint with [Stable Diffusion 2.1 - Base](stabilityai/stable-diffusion-2-1-base) as the checkpoint has been trained on it.
119
- Experimentally, the checkpoint can be used with other diffusion models such as dreamboothed stable diffusion.
120
-
121
- To use with Stable Diffusion 1.5, insert `subfolder="diffusion_sd15"` into the from_pretrained arguments. A v1.5 half-precision variant is provided but untested.
122
-
123
- 1. Install `diffusers` and related packages:
124
- ```
125
- $ pip install diffusers transformers accelerate
126
- ```
127
-
128
- 2. Run code:
129
- ```py
130
- from PIL import Image
131
- import numpy as np
132
- import torch
133
- from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
134
- from diffusers.utils import load_image
135
-
136
- image = load_image(
137
- "https://huggingface.co/CrucibleAI/ControlNetMediaPipeFace/resolve/main/samples_laion_face_dataset/family_annotation.png"
138
- )
139
-
140
- # Stable Diffusion 2.1-base:
141
- controlnet = ControlNetModel.from_pretrained("CrucibleAI/ControlNetMediaPipeFace", torch_dtype=torch.float16, variant="fp16")
142
- pipe = StableDiffusionControlNetPipeline.from_pretrained(
143
- "stabilityai/stable-diffusion-2-1-base", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
144
- )
145
- # OR
146
- # Stable Diffusion 1.5:
147
- controlnet = ControlNetModel.from_pretrained("CrucibleAI/ControlNetMediaPipeFace", subfolder="diffusion_sd15")
148
- pipe = StableDiffusionControlNetPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None)
149
-
150
- pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
151
-
152
- # Remove if you do not have xformers installed
153
- # see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers
154
- # for installation instructions
155
- pipe.enable_xformers_memory_efficient_attention()
156
- pipe.enable_model_cpu_offload()
157
-
158
- image = pipe("a happy family at a dentist advertisement", image=image, num_inference_steps=30).images[0]
159
- image.save('./images.png')
160
- ```
161
-
162
 
163
  # License:
164
 
@@ -209,4 +160,4 @@ Sample images for this document were obtained from Unsplash and are CC0.
209
  }
210
  ```
211
 
212
- This project was made possible by Crucible AI.
 
1
  ---
2
+ language:
3
+ - en
4
+ thumbnail: ""
5
  tags:
6
  - controlnet
7
  - laion
8
  - face
9
  - mediapipe
10
+ license: "openrail"
 
 
11
  datasets:
12
  - LAION-Face
13
  - LAION
 
14
  ---
15
 
16
  # ControlNet LAION Face Dataset
 
104
  We have provided `gradio_face2image.py`. Update the following two lines to point them to your trained model.
105
 
106
  ```
107
+ model = create_model('./models/cldm_v21.yaml').cpu() # If you fine-tuned on SD2.1 base, this does not need to change.
108
  model.load_state_dict(load_state_dict('./models/control_sd21_openpose.pth', location='cuda'))
109
  ```
110
 
111
  The model has some limitations: while it is empirically better at tracking gaze and mouth poses than previous attempts, it may still ignore controls. Adding details to the prompt like, "looking right" can abate bad behavior.
112
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  # License:
115
 
 
160
  }
161
  ```
162
 
163
+ This project was made possible by Crucible AI.
config.json DELETED
@@ -1,47 +0,0 @@
1
- {
2
- "_class_name": "ControlNetModel",
3
- "_diffusers_version": "0.15.0.dev0",
4
- "_name_or_path": "/home/patrick_huggingface_co/temp_control",
5
- "act_fn": "silu",
6
- "attention_head_dim": [
7
- 5,
8
- 10,
9
- 20,
10
- 20
11
- ],
12
- "block_out_channels": [
13
- 320,
14
- 640,
15
- 1280,
16
- 1280
17
- ],
18
- "class_embed_type": null,
19
- "conditioning_embedding_out_channels": [
20
- 16,
21
- 32,
22
- 96,
23
- 256
24
- ],
25
- "controlnet_conditioning_channel_order": "rgb",
26
- "cross_attention_dim": 1024,
27
- "down_block_types": [
28
- "CrossAttnDownBlock2D",
29
- "CrossAttnDownBlock2D",
30
- "CrossAttnDownBlock2D",
31
- "DownBlock2D"
32
- ],
33
- "downsample_padding": 1,
34
- "flip_sin_to_cos": true,
35
- "freq_shift": 0,
36
- "in_channels": 4,
37
- "layers_per_block": 2,
38
- "mid_block_scale_factor": 1,
39
- "norm_eps": 1e-05,
40
- "norm_num_groups": 32,
41
- "num_class_embeds": null,
42
- "only_cross_attention": false,
43
- "projection_class_embeddings_input_dim": null,
44
- "resnet_time_scale_shift": "default",
45
- "upcast_attention": false,
46
- "use_linear_projection": true
47
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
control_v2p_sd15_mediapipe_face.full.ckpt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:a2a71953d7372d5585899b44693a7532ebbf80c091108ae2b8987ca93cc2dac2
3
- size 8601300183
 
 
 
 
control_v2p_sd15_mediapipe_face.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:2f2ccead3a8c0b9fbf9cad7b8eaa29834983ced916c766a92fb84db34ff29e43
3
- size 1445239863
 
 
 
 
control_v2p_sd15_mediapipe_face.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:5be501156709895f0b14a7ec76faae7cf0a105f76895252a2c69db541629628f
3
- size 1445154814
 
 
 
 
control_v2p_sd15_mediapipe_face.yaml DELETED
@@ -1,79 +0,0 @@
1
- model:
2
- target: cldm.cldm.ControlLDM
3
- params:
4
- linear_start: 0.00085
5
- linear_end: 0.0120
6
- num_timesteps_cond: 1
7
- log_every_t: 200
8
- timesteps: 1000
9
- first_stage_key: "jpg"
10
- cond_stage_key: "txt"
11
- control_key: "hint"
12
- image_size: 64
13
- channels: 4
14
- cond_stage_trainable: false
15
- conditioning_key: crossattn
16
- monitor: val/loss_simple_ema
17
- scale_factor: 0.18215
18
- use_ema: False
19
- only_mid_control: False
20
-
21
- control_stage_config:
22
- target: cldm.cldm.ControlNet
23
- params:
24
- image_size: 32 # unused
25
- in_channels: 4
26
- hint_channels: 3
27
- model_channels: 320
28
- attention_resolutions: [ 4, 2, 1 ]
29
- num_res_blocks: 2
30
- channel_mult: [ 1, 2, 4, 4 ]
31
- num_heads: 8
32
- use_spatial_transformer: True
33
- transformer_depth: 1
34
- context_dim: 768
35
- use_checkpoint: True
36
- legacy: False
37
-
38
- unet_config:
39
- target: cldm.cldm.ControlledUnetModel
40
- params:
41
- image_size: 32 # unused
42
- in_channels: 4
43
- out_channels: 4
44
- model_channels: 320
45
- attention_resolutions: [ 4, 2, 1 ]
46
- num_res_blocks: 2
47
- channel_mult: [ 1, 2, 4, 4 ]
48
- num_heads: 8
49
- use_spatial_transformer: True
50
- transformer_depth: 1
51
- context_dim: 768
52
- use_checkpoint: True
53
- legacy: False
54
-
55
- first_stage_config:
56
- target: ldm.models.autoencoder.AutoencoderKL
57
- params:
58
- embed_dim: 4
59
- monitor: val/rec_loss
60
- ddconfig:
61
- double_z: true
62
- z_channels: 4
63
- resolution: 256
64
- in_channels: 3
65
- out_ch: 3
66
- ch: 128
67
- ch_mult:
68
- - 1
69
- - 2
70
- - 4
71
- - 4
72
- num_res_blocks: 2
73
- attn_resolutions: []
74
- dropout: 0.0
75
- lossconfig:
76
- target: torch.nn.Identity
77
-
78
- cond_stage_config:
79
- target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
diffusion_pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:36dcd318d499df44b35432599a1b70f598e7bb42b479e4e67d4adf7b7e87e87d
3
- size 1457051321
 
 
 
 
diffusion_pytorch_model.fp16.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:7f70c38860e0d1fcd0f5ed38bc34e61c7337b9001bed57f7bff6eba6471406f0
3
- size 728596455
 
 
 
 
diffusion_pytorch_model.fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:02b3a8e04154b4c3d11f5210217f0dbf3fac8612d62d015cd059f2b9fe4c3364
3
- size 728496846
 
 
 
 
diffusion_pytorch_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:a683e98e2427fd6242edc9af6620708f2f8fc84bfc049fafe549e350f8d42d73
3
- size 1456953564
 
 
 
 
diffusion_sd15/config.json DELETED
@@ -1,42 +0,0 @@
1
- {
2
- "_class_name": "ControlNetModel",
3
- "_diffusers_version": "0.15.0.dev0",
4
- "_name_or_path": "/home/josephcatrambone/ControlNet/models",
5
- "act_fn": "silu",
6
- "attention_head_dim": 8,
7
- "block_out_channels": [
8
- 320,
9
- 640,
10
- 1280,
11
- 1280
12
- ],
13
- "class_embed_type": null,
14
- "conditioning_embedding_out_channels": [
15
- 16,
16
- 32,
17
- 96,
18
- 256
19
- ],
20
- "controlnet_conditioning_channel_order": "rgb",
21
- "cross_attention_dim": 768,
22
- "down_block_types": [
23
- "CrossAttnDownBlock2D",
24
- "CrossAttnDownBlock2D",
25
- "CrossAttnDownBlock2D",
26
- "DownBlock2D"
27
- ],
28
- "downsample_padding": 1,
29
- "flip_sin_to_cos": true,
30
- "freq_shift": 0,
31
- "in_channels": 4,
32
- "layers_per_block": 2,
33
- "mid_block_scale_factor": 1,
34
- "norm_eps": 1e-05,
35
- "norm_num_groups": 32,
36
- "num_class_embeds": null,
37
- "only_cross_attention": false,
38
- "projection_class_embeddings_input_dim": null,
39
- "resnet_time_scale_shift": "default",
40
- "upcast_attention": null,
41
- "use_linear_projection": false
42
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
diffusion_sd15/diffusion_pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:f63de389f776b75bb11f10487a187573aea84f9a51debd08f314bd084e7fb362
3
- size 1445254969
 
 
 
 
diffusion_sd15/diffusion_pytorch_model.fp16.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0c37b3dd41e956160909129b50f84fd938116550727b491192cbdbe6f896cd7b
3
- size 722696633
 
 
 
 
diffusion_sd15/diffusion_pytorch_model.fp16.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9fb50465b4fd7e15f0dc7df8031767e57309cfda2917082485bcf6c11bedb540
3
- size 722598642
 
 
 
 
gradio_face2image.py CHANGED
@@ -13,8 +13,8 @@ from laion_face_common import generate_annotation
13
  from share import *
14
 
15
 
16
- model = create_model('./control_v2p_sd21_mediapipe_face.yaml').cpu()
17
- model.load_state_dict(load_state_dict('./control_v2p_sd21_mediapipe_face.full.ckpt', location='cuda'))
18
  model = model.cuda()
19
  ddim_sampler = DDIMSampler(model) # ControlNet _only_ works with DDIM.
20
 
 
13
  from share import *
14
 
15
 
16
+ model = create_model('./models/cldm_v21.yaml').cpu()
17
+ model.load_state_dict(load_state_dict('./models/controlnet_face_condition_epoch_4_0percent.ckpt', location='cuda'))
18
  model = model.cuda()
19
  ddim_sampler = DDIMSampler(model) # ControlNet _only_ works with DDIM.
20
 
control_v2p_sd21_mediapipe_face.yaml β†’ models/cldm_v21.yaml RENAMED
File without changes
control_v2p_sd21_mediapipe_face.full.ckpt β†’ models/controlnet_sd21_laion_face_v2_full.ckpt RENAMED
File without changes
control_v2p_sd21_mediapipe_face.pth β†’ models/controlnet_sd21_laion_face_v2_pruned.pth RENAMED
File without changes
control_v2p_sd21_mediapipe_face.safetensors β†’ models/controlnet_sd21_laion_face_v2_pruned.safetensors RENAMED
File without changes