amd
/

Text-to-Image
Diffusers
Edit model card

AMD Nitro Diffusion

image/jpeg

Introduction

AMD Nitro Diffusion is a series of efficient text-to-image generation models that are distilled from popular diffusion models on AMD Instinct™ GPUs. The release consists of:

⚡️ Open-source code! The models are based on our re-implementation of Latent Adversarial Diffusion Distillation, the method used to build the popular Stable Diffusion 3 Turbo model. Since the original authors didn't provide training code, we release our re-implementation to help advance further research in the field.

Details

  • Model architecture: Stable Diffusion 2.1 Nitro has the same architecture as Stable Diffusion 2.1 and is compatible with the diffusers pipeline.
  • Inference steps: This model is distilled to perform inference in just a single step. However, the training code also supports distilling a model for 2, 4 or 8 steps.
  • Hardware: We use a single node consisting of 4 AMD Instinct™ MI250 GPUs for distilling Stable Diffusion 2.1 Nitro.
  • Dataset: We use 1M prompts from DiffusionDB and generate the corresponding images from the base Stable Diffusion 2.1 Nitro model.
  • Training cost: The distillation process achieves reasonable results in less than 2 days on a single node.

Quickstart

from diffusers import DDPMScheduler, DiffusionPipeline
import torch

scheduler = DDPMScheduler.from_pretrained("stabilityai/stable-diffusion-2-1-base", subfolder="scheduler")
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base", scheduler=scheduler)

ckpt_path = '<path to distilled checkpoint>'
unet_state_dict = torch.load(ckpt_path)
pipe.unet.load_state_dict(unet_state_dict)
pipe = pipe.to("cuda")

image = pipe(prompt='a photo of a cat',
             num_inference_steps=1,
             guidance_scale=0,
             timesteps=[999]).images[0]

For more details on training and evaluation please visit the GitHub repo.

Results

Compared to the Stable Diffusion 2.1 base model, we achieve 95.9% reduction in FLOPs at the cost of just 2.5% lower CLIP score and 2.2% higher FID.

Model FID ↓ CLIP ↑ FLOPs Latency on AMD Instinct MI250 (sec)
Stable Diffusion 2.1 base, 50 steps (cfg=7.5) 25.47 0.3286 83.04 4.94
Stable Diffusion 2.1 Nitro, 1 step 26.04 0.3204 3.36 0.18

License

Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for amd/SD2.1-Nitro

Finetuned
(48)
this model

Dataset used to train amd/SD2.1-Nitro

Collection including amd/SD2.1-Nitro