English
File size: 7,403 Bytes
27a0ef5
 
 
 
 
 
 
 
d8993ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---
license: other
license_name: tencent-hunyuan-community
license_link: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt
language:
- en
---

## Using HunyuanDiT IP-Adapter


### Instructions

 The dependencies and installation are basically the same as the base model, and we use the module weights for training.
 Download the model using the following commands:

```bash
cd HunyuanDiT
# Use the huggingface-cli tool to download the model.
# We recommend using module weights as the base model for IP-Adapter inference, as our provided pretrained weights are trained on them.
huggingface-cli download Tencent-Hunyuan/IP-Adapter/ipa.pt --local-dir ./ckpts/t2i/model
huggingface-cli download Tencent-Hunyuan/IP-Adapter/clip_img_encoder.pt  --local-dir ./ckpts/t2i/model/clip_img_encoder

# Quick start
python3 sample_ipadapter.py  --infer-mode fa --ref-image-path ipadapter/input/tiger.png --i-scale 1.0 --prompt 一只老虎在海洋中游泳,背景是海洋。构图方式是居中构图,呈现了动漫风格和文化,营造了平静的氛围。 --infer-steps 100 --is-ipa True --load-key module
```

Examples of ref input and IP-Adapter results are as follows:
<table>
  <tr>
    <td colspan="3" align="center">Ref Input</td>
  </tr>
  


  

  <tr>
    <td align="center"><img src="asset/input/tiger.png" alt="Image 0" width="200"/></td>
    <td align="center"><img src="asset/input/beauty.png" alt="Image 1" width="200"/></td>
    <td align="center"><img src="asset/input/xunyicao.png" alt="Image 2" width="200"/></td>
    
  </tr>
  
  <tr>
    <td colspan="3" align="center">IP-Adapter Output</td>
  </tr>

  <tr>
    <td align="center">一只老虎在奔跑。<br>(A tiger running.) </td>
    <td align="center">一个卡通美女,抱着一只小猪。<br>(A cartoon beauty holding a little pig.) </td>
    <td align="center">一片紫色薰衣草地。<br>(A purple lavender field.) </td>
  </tr>

  <tr>
    <td align="center"><img src="asset/output/tiger_run.png" alt="Image 3" width="200"/></td>
    <td align="center"><img src="asset/output/beauty_pig.png" alt="Image 4" width="200"/></td>
    <td align="center"><img src="asset/output/xunyicao_res.png" alt="Image 5" width="200"/></td>
  </tr>

  <tr>
    <td align="center">一只老虎在看书。<br>(A tiger is reading a book.) </td>
    <td align="center">一个卡通美女,穿着绿色衣服。<br>(A cartoon beauty wearing green clothes.) </td>
    <td align="center">一片紫色薰衣草地,有一只可爱的小狗。<br>(A purple lavender field with a cute puppy.) </td>
  </tr>

  <tr>
    <td align="center"><img src="asset/output/tiger_book.png" alt="Image 3" width="200"/></td>
    <td align="center"><img src="asset/output/beauty_green_cloth.png" alt="Image 4" width="200"/></td>
    <td align="center"><img src="asset/output/xunyicao_dog.png" alt="Image 5" width="200"/></td>
  </tr>

  <tr>
    <td align="center">一只老虎在咆哮。<br>(A tiger is roaring.) </td>
    <td align="center">一个卡通美女,戴着墨镜。<br>(A cartoon beauty wearing sunglasses.) </td>
    <td align="center">水墨风格,一片紫色薰衣草地。<br>(Ink style. A purple lavender field.) </td>
  </tr>
  <tr>
    <td align="center"><img src="asset/output/tiger_roar.png" alt="Image 3" width="200"/></td>
    <td align="center"><img src="asset/output/beauty_glass.png" alt="Image 4" width="200"/></td>
    <td align="center"><img src="asset/output/xunyicao_style.png" alt="Image 5" width="200"/></td>
  </tr>
 
  
</table>


### Training

We provide base model weights for IP-Adapter training, you can use `module` weights for IP-Adapter training.

Here is an example, we load the `module` weights into the main model and conduct IP-Adapter training. 

If apply multiple resolution training, you need to add the `--multireso` and `--reso-step 64` parameter. 

```bash
task_flag="IP_Adapter"                                # the task flag is used to identify folders.                         # checkpoint root for resume
index_file=path/to/your/index_file
results_dir=./log_EXP                                        # save root for results
batch_size=1                                                 # training batch size
image_size=1024                                              # training image resolution
grad_accu_steps=1                                            # gradient accumulation
warmup_num_steps=0                                           # warm-up steps
lr=0.0001                                                    # learning rate
ckpt_every=10                                         # create a ckpt every a few steps.
ckpt_latest_every=10000                                    # create a ckpt named `latest.pt` every a few steps.
ckpt_every_n_epoch=2                                         # create a ckpt every a few epochs.
epochs=8                                                     # total training epochs

PYTHONPATH=. \
sh $(dirname "$0")/run_g_ipadapter.sh \
    --task-flag ${task_flag} \
    --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.018 \
    --predict-type v_prediction \
    --multireso \
    --reso-step 64 \
    --uncond-p 0.22 \
    --uncond-p-t5 0.22\
    --uncond-p-img 0.05\
    --index-file ${index_file} \
    --random-flip \
    --lr ${lr} \
    --batch-size ${batch_size} \
    --image-size ${image_size} \
    --global-seed 999 \
    --grad-accu-steps ${grad_accu_steps} \
    --warmup-num-steps ${warmup_num_steps} \
    --use-flash-attn \
    --use-fp16 \
    --extra-fp16 \
    --results-dir ${results_dir} \
    --resume\
    --resume-module-root ckpts/t2i/model/pytorch_model_module.pt \
    --epochs ${epochs} \
    --ckpt-every ${ckpt_every} \
    --ckpt-latest-every ${ckpt_latest_every} \
    --ckpt-every-n-epoch ${ckpt_every_n_epoch} \
    --log-every 10 \
    --deepspeed \
    --use-zero-stage 2 \
    --gradient-checkpointing \
    --no-strict \
    --training-parts ipadapter \
    --is-ipa True \
    --resume-ipa True \
    --resume-ipa-root ckpts/t2i/model/ipa.pt  \
    "$@"

```

Recommended parameter settings

|     Parameter     |  Description  |          Recommended Parameter Value                               | Note|
|:---------------:|:---------:|:---------------------------------------------------:|:--:|
|   `--batch-size` |    Training batch size    |        1        | Depends on GPU memory|
|   `--grad-accu-steps` |    Size of gradient accumulation    |       2        | - |
|   `--lr` |    Learning rate  |        0.0001        | - |
|   `--training-parts` |  be trained parameters when training IP-Adapter  |        ipadapter        | - |
|   `--is-ipa` |  training IP-Adapter or not  |        True       | - |
|   `--resume-ipa-root` |  resume ipa model or not when training  |        ipa model path       | - |


### Inference
Use the following command line for inference.

a. Use the parameter float i-scale to specify the weight of IP-Adapter reference image. The bigger parameter indicates more relativity to reference image.
```bash
python3 sample_ipadapter.py  --infer-mode fa --ref-image-path ipadapter/input/beach.png --i-scale 1.0 --prompt 一只老虎在海洋中游泳,背景是海洋。构图方式是居中构图,呈现了动漫风格和文化,营造了平静的氛围。 --infer-steps 100 --is-ipa True --load-key module
```