valhalla commited on
Commit
f4abe09
·
1 Parent(s): e17c3c4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
4
+ tags:
5
+ - art
6
+ - t2i-adapter
7
+ - stable-diffusion
8
+ - image-to-image
9
+ ---
10
+
11
+ # T2I-Adapter-SDXL - Depth-MiDaS
12
+
13
+ T2I Adapter is a network providing additional conditioning to stable diffusion. Each t2i checkpoint takes a different type of conditioning as input and is used with a specific base stable diffusion checkpoint.
14
+
15
+ This checkpoint provides conditioning on canny for the StableDiffusionXL checkpoint.
16
+
17
+ ## Model Details
18
+ - **Developed by:** T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
19
+ - **Model type:** Diffusion-based text-to-image generation model
20
+ - **Language(s):** English
21
+ - **License:** Apache 2.0
22
+ - **Resources for more information:** [GitHub Repository](https://github.com/TencentARC/T2I-Adapter), [Paper](https://arxiv.org/abs/2302.08453).
23
+ - **Cite as:**
24
+
25
+ @misc{
26
+ title={T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models},
27
+ author={Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie},
28
+ year={2023},
29
+ eprint={2302.08453},
30
+ archivePrefix={arXiv},
31
+ primaryClass={cs.CV}
32
+ }
33
+
34
+ ### Checkpoints
35
+
36
+ | Model Name | Control Image Overview| Control Image Example | Generated Image Example |
37
+ |---|---|---|---|
38
+ |[Adapter/t2iadapter_canny_sdxlv1](https://huggingface.co/Adapter/t2iadapter_canny_sdxlv1)<br/> *Trained with canny edge detection* | A monochrome image with white edges on a black background.|<a href=""><img width="64" style="margin:0;padding:0;" src=""/></a>|<a href=""><img width="64" src=""/></a>|
39
+ |[Adapter/t2iadapter_sketch_sdxlv1](https://huggingface.co/Adapter/t2iadapter_sketch_sdxlv1)<br/> *Trained with [PidiNet](https://github.com/zhuoinoulu/pidinet) edge detection* | A hand-drawn monochrome image with white outlines on a black background.|<a href=""><img width="64" style="margin:0;padding:0;" src=""/></a>|<a href=""><img width="64" src=""/></a>|
40
+ |[Adapter/t2iadapter_depth_sdxlv1](https://huggingface.co/Adapter/t2iadapter_depth_sdxlv1)<br/> *Trained with Midas depth estimation* | A grayscale image with black representing deep areas and white representing shallow areas.|<a href=""><img width="64" src=""/></a>|<a href=""><img width="64" src=""/></a>|
41
+ |[Adapter/t2iadapter_openpose_sdxlv1](https://huggingface.co/Adapter/t2iadapter_openpose_sdxlv1)<br/> *Trained with OpenPose bone image* | A [OpenPose bone](https://github.com/CMU-Perceptual-Computing-Lab/openpose) image.|<a href=""><img width="64" src=""/></a>|<a href=""><img width="64" src=""/></a>|
42
+
43
+
44
+ ## Example
45
+
46
+ To get started, first install the required dependencies:
47
+
48
+ ```bash
49
+ pip install git+https://github.com/huggingface/diffusers.git@t2iadapterxl # for now
50
+ pip install git+https://github.com/patrickvonplaten/controlnet_aux.git # for conditioning models and detectors
51
+ pip install transformers accelerate safetensors
52
+ ```
53
+
54
+ 1. Images are first downloaded into the appropriate *control image* format.
55
+ 2. The *control image* and *prompt* are passed to the [`StableDiffusionXLAdapterPipeline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py#L125).
56
+
57
+ Let's have a look at a simple example using the [Canny Adapter](https://huggingface.co/Adapter/t2iadapter_canny_sdxlv1).
58
+
59
+ ```py
60
+ from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler
61
+ from diffusers.utils import load_image, make_image_grid
62
+ from controlnet_aux.zoe import MidasDetector
63
+
64
+ # load adapter
65
+ adapter = T2IAdapter.from_pretrained(
66
+ "Adapter/t2i-adapter-depth-midas-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
67
+ ).to("cuda")
68
+
69
+ # load euler_a scheduler
70
+ model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
71
+ euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
72
+ vae= AutoencoderKL.from_pretrained(
73
+ "madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
74
+ )
75
+ pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
76
+ model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16",
77
+ ).to("cuda")
78
+ pipe.enable_xformers_memory_efficient_attention()
79
+
80
+
81
+ midas_depth = MidasDetector.from_pretrained(
82
+ "valhalla/t2iadapter-aux-models", filename="dpt_large_384.pt", model_type="dpt_large"
83
+ ).to("cuda")
84
+
85
+
86
+ url = "https://raw.githubusercontent.com/lllyasviel/ControlNet/main/test_imgs/cyber.png"
87
+ image = load_image(url)
88
+ image = midas_depth(
89
+ image, detect_resolution=512, image_resolution=1024
90
+ ).resize((896, 1152))
91
+
92
+ prompt = "a robot, mount fuji in the background, 4k photo, highly detailed"
93
+ negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
94
+
95
+ gen_images = pipe(
96
+ prompt=prompt,
97
+ negative_prompt=negative_prompt,
98
+ image=image,
99
+ num_inference_steps=30,
100
+ adapter_conditioning_scale=1,
101
+ cond_tau=1
102
+ ).images
103
+ gen_images[0]
104
+ ```