File size: 1,485 Bytes
7ee83b5
 
 
 
 
 
 
 
 
 
 
f9608d9
7ee83b5
 
 
f9608d9
7ee83b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
datasets:
- yuvalkirstain/pickapic_v2
library_name: diffusers
---
# Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility
<p align="center">
    <img src="https://github.com/jacklishufan/diffusion-kto/blob/main/assets/teaser.png?raw=true", width=60%> <br>
</p>


This model is fine-tuned from Stable Diffusion v1-5 on Pick-a-Pic v2 dataset using KTO.


### Usage
```python
import torch
from diffusers import AutoencoderKL, UNet2DConditionModel, DiffusionPipeline
vae_path = model_name = "runwayml/stable-diffusion-v1-5"
device = 'cuda'
weight_dtype = torch.float16
vae = AutoencoderKL.from_pretrained(
    vae_path,
    subfolder="vae",
)
unet = UNet2DConditionModel.from_pretrained(
    "jacklishufan/diffusion-kto", subfolder="unet",
)
pipeline = DiffusionPipeline.from_pretrained(
    model_name,
    vae=vae,
    unet=unet,
    device=device,
).to(device).to(weight_dtype)


result = pipeline(
    prompt="Self-portrait oil painting, a beautiful cyborg with golden hair, 8k",
    num_inference_steps=50,
    guidance_scale=7.0
)
img = result[0][0]
```
### Code

The code is available [here](https://github.com/jacklishufan/diffusion-kto)

### Citation
```
@misc{li2024aligning,
      title={Aligning Diffusion Models by Optimizing Human Utility}, 
      author={Shufan Li and Konstantinos Kallidromitis and Akash Gokul and Yusuke Kato and Kazuki Kozuka},
      year={2024},
      eprint={2404.04465},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```