File size: 1,485 Bytes
7ee83b5 f9608d9 7ee83b5 f9608d9 7ee83b5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
datasets:
- yuvalkirstain/pickapic_v2
library_name: diffusers
---
# Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility
<p align="center">
<img src="https://github.com/jacklishufan/diffusion-kto/blob/main/assets/teaser.png?raw=true", width=60%> <br>
</p>
This model is fine-tuned from Stable Diffusion v1-5 on Pick-a-Pic v2 dataset using KTO.
### Usage
```python
import torch
from diffusers import AutoencoderKL, UNet2DConditionModel, DiffusionPipeline
vae_path = model_name = "runwayml/stable-diffusion-v1-5"
device = 'cuda'
weight_dtype = torch.float16
vae = AutoencoderKL.from_pretrained(
vae_path,
subfolder="vae",
)
unet = UNet2DConditionModel.from_pretrained(
"jacklishufan/diffusion-kto", subfolder="unet",
)
pipeline = DiffusionPipeline.from_pretrained(
model_name,
vae=vae,
unet=unet,
device=device,
).to(device).to(weight_dtype)
result = pipeline(
prompt="Self-portrait oil painting, a beautiful cyborg with golden hair, 8k",
num_inference_steps=50,
guidance_scale=7.0
)
img = result[0][0]
```
### Code
The code is available [here](https://github.com/jacklishufan/diffusion-kto)
### Citation
```
@misc{li2024aligning,
title={Aligning Diffusion Models by Optimizing Human Utility},
author={Shufan Li and Konstantinos Kallidromitis and Akash Gokul and Yusuke Kato and Kazuki Kozuka},
year={2024},
eprint={2404.04465},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |