Edit model card

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference


We propose MaPO, a reference-free, sample-efficient, memory-friendly alignment technique for text-to-image diffusion models. For more details on the technique, please refer to our paper here.

Developed by

  • Jiwoo Hong* (KAIST AI)
  • Sayak Paul* (Hugging Face)
  • Noah Lee (KAIST AI)
  • Kashif Rasul (Hugging Face)
  • James Thorne (KAIST AI)
  • Jongheon Jeong (Korea University)

Dataset

This model was fine-tuned from Stable Diffusion XL on the pixel art split of Pick-Style.

Training Code

Refer to our code repository here.

Inference

from diffusers import DiffusionPipeline, AutoencoderKL, UNet2DConditionModel
import torch 

sdxl_id = "stabilityai/stable-diffusion-xl-base-1.0"
vae_id = "madebyollin/sdxl-vae-fp16-fix"
unet_id = "mapo-t2i/mapo-pick-style-pixel-art"

vae = AutoencoderKL.from_pretrained(vae_id, torch_dtype=torch.float16)
unet = UNet2DConditionModel.from_pretrained(unet_id, subfolder='unet', torch_dtype=torch.float16)
pipeline = DiffusionPipeline.from_pretrained(sdxl_id, vae=vae, unet=unet, torch_dtype=torch.float16).to("cuda")

prompt = "portrait of gorgeous cyborg with golden hair, high resolution"
image = pipeline(prompt=prompt, num_inference_steps=30).images[0]

For qualitative results, please visit our project website.

Citation

@misc{hong2024marginaware,
    title={Margin-aware Preference Optimization for Aligning Diffusion Models without Reference}, 
    author={Jiwoo Hong and Sayak Paul and Noah Lee and Kashif Rasul and James Thorne and Jongheon Jeong},
    year={2024},
    eprint={2406.06424},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
25

Finetuned from

Collection including mapo-t2i/mapo-pick-style-pixel-art