wanghaofan commited on
Commit
24d6ae5
1 Parent(s): 8021a67
Files changed (6) hide show
  1. .DS_Store +0 -0
  2. .gitattributes +1 -0
  3. README.md +40 -0
  4. assets/teaser.png +3 -0
  5. config.json +16 -0
  6. images/depth.jpeg +0 -0
.DS_Store ADDED
Binary file (6.15 kB). View file
 
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/teaser.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,43 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ # SD3-ControlNet-Depth
6
+
7
+ <img src="./assets/teaser.png"/>
8
+
9
+ # Demo
10
+ ```python
11
+ import torch
12
+ from diffusers import StableDiffusion3ControlNetPipeline
13
+ from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
14
+ from diffusers.utils import load_image
15
+
16
+ # load pipeline
17
+ controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Depth")
18
+ pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
19
+ "stabilityai/stable-diffusion-3-medium-diffusers",
20
+ controlnet=controlnet
21
+ )
22
+ pipe.to("cuda", torch.float16)
23
+
24
+ # config
25
+ control_image = load_image("https://huggingface.co/InstantX/SD3-Controlnet-Depth/resolve/main/images/depth.jpeg")
26
+ prompt = "a panda cub, captured in a close-up, in forest, is perched on a tree trunk. good composition, Photography, the cub's ears, a fluffy black, are tucked behind its head, adding a touch of whimsy to its appearance. a lush tapestry of green leaves in the background. depth of field, National Geographic"
27
+ n_prompt = "bad hands, blurry, NSFW, nude, naked, porn, ugly, bad quality, worst quality"
28
+
29
+ # to reproduce result in our example
30
+ generator = torch.Generator(device="cpu").manual_seed(4000)
31
+ image = pipe(
32
+ prompt,
33
+ negative_prompt=n_prompt,
34
+ control_image=control_image,
35
+ controlnet_conditioning_scale=0.5,
36
+ guidance_scale=7.0,
37
+ generator=generator
38
+ ).images[0]
39
+ image.save('image.jpg')
40
+ ```
41
+
42
+ # Limitation
43
+ Due to the fact that only 1024*1024 pixel resolution was used during the training phase, the inference performs best at this size, with other sizes yielding suboptimal results.
assets/teaser.png ADDED

Git LFS Details

  • SHA256: 6feef3c4089e444ccc71bb2270247127d14a680baeb13e6352c51869a52c5338
  • Pointer size: 132 Bytes
  • Size of remote file: 1.29 MB
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "SD3ControlNetModel",
3
+ "_diffusers_version": "0.30.0.dev0",
4
+ "_name_or_path": "stabilityai/stable-diffusion-3-medium-diffusers",
5
+ "attention_head_dim": 64,
6
+ "caption_projection_dim": 1536,
7
+ "in_channels": 16,
8
+ "joint_attention_dim": 4096,
9
+ "num_attention_heads": 24,
10
+ "num_layers": 12,
11
+ "out_channels": 16,
12
+ "patch_size": 2,
13
+ "pooled_projection_dim": 2048,
14
+ "pos_embed_max_size": 192,
15
+ "sample_size": 128
16
+ }
images/depth.jpeg ADDED