yichaodu
/

DiffusionDPO-bias-claude3-opus

stable-diffusion

stable-diffusion-diffusers

Model card Files Files and versions Community

DiffusionDPO-bias-claude3-opus / README.md

yichaodu's picture

Upload README.md with huggingface_hub

f2db1d1 verified 8 months ago

|

1.38 kB

	# Aligned Diffusion Model via DPO

	Diffusion Model Aligned with thef following reward model and DPO algorithm
	```
	close-sourced vlm: claude3-opus gemini-1.5 gpt-4o gpt-4v
	open-sourced vlm: internvl-1.5
	score model: hps-2.1
	```

	## How to Use

	You can load the model and perform inference as follows:
	```python
	from diffusers import StableDiffusionPipeline, UNet2DConditionModel

	pretrained_model_name = "runwayml/stable-diffusion-v1-5"

	dpo_unet = UNet2DConditionModel.from_pretrained(
	"path/to/checkpoint",
	subfolder='unet',
	torch_dtype=torch.float16
	).to('cuda')

	pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name, torch_dtype=torch.float16)
	pipeline = pipeline.to('cuda')
	pipeline.safety_checker = None
	pipeline.unet = dpo_unet

	generator = torch.Generator(device='cuda')
	generator = generator.manual_seed(1)

	prompt = "a pink flower"

	image = pipeline(prompt=prompt, generator=generator, guidance_scale=gs).images[0]

	```


	## Citation
	```
	@misc{mjbench2024mjbench,
	title={MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?},
	author={Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Leria HUANG, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao},
	year={2024}
	}
	```