README.md · hbXNov/ucla-mint-finetune-sd-im1k at d7fa5c2f238b710b8aece6e071b9dd899523c19e

metadata

license: mit

Paper: Leaving Reality to Imagination: Robust Classification via Generated Datasets (https://arxiv.org/abs/2302.02503)

Finetuning Recipe:

We finetune the Stable Diffusion V1.5 model for 1 epoch on the complete ImageNet-1K training dataset, which contains ~1.3M images. The model was finetuned on a single 24GB A5000 GPU. It took us ~1day to complete the finetuning.
The finetuning code was adopted directly from the Huggingface Diffusers library - https://github.com/huggingface/diffusers/tree/main/examples/text_to_image. Our adopted code is present at XXXX
During finetuning, we (a) do not enable --use_ema, (b) do not use gradient checkpoint, (c) use a lower learning rate = 1e-6, (d) use a 'cosine' learning rate schedule with 0 warmup steps, (e) enable --use_8bit_adam from bitsandbytes.

Post-finetuning, we repeatedly sample the data from the generative model to generate 1.3M training and 50K validation images.

All the newly generated images from the finetuned Stable Diffusion as well as the pretrained Stable Diffusion are present here - https://drive.google.com/drive/folders/14DJyU_xx018Ir6Cw-mETKw9a0yLtc2NJ?usp=sharing