File size: 3,032 Bytes

8d8a8af
 
3aa9e0a
 
 
8d8a8af
3aa9e0a
c86bde1
3aa9e0a
83dd889
 
 
c86bde1
 
 
 
83dd889
 
 
 
3aa9e0a
 
 
 
83dd889
95b7c0d
83dd889
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62b1068
83dd889
62b1068
 
 
 
83dd889
62b1068
83dd889
 
62b1068
83dd889
62b1068
 
83dd889
 
62b1068
 
83dd889
62b1068
 
83dd889
 
3aa9e0a
83dd889
3aa9e0a
 
 
 
 
 
 
 
 
83dd889

---
license: openrail++
tags:
- text-to-image
- stable-diffusion
---

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/FAHjxgN2tk6uXmQAUeFI5.jpeg)

<hr>

# Overview
SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to more simply generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution.

*Note:* It bears repeating that SDXL-512 was not trained to be "better" than SDXL, but rather to simplify prompting for higher-fidelity outputs at and around the 512x512 resolution.

- **Use it with [Hotshot-XL](https://huggingface.co/hotshotco/Hotshot-XL) (recommended)**

<hr>

# Model Description
- **Developed by**: Natural Synthetics Inc.
- **Model type**: Diffusion-based text-to-image generative model
- **License**: CreativeML Open RAIL++-M License
- **Model Description**: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution.
- **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/Hotshot-XL).
- **Finetuned from model**: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)

<hr>

# 🧨 Diffusers 

Make sure to upgrade diffusers to >= 0.18.2:
```
pip install diffusers --upgrade
```

In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
```
pip install invisible_watermark transformers accelerate safetensors
```

Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**:
```py
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

pipe = StableDiffusionXLPipeline.from_pretrained(
    "hotshotco/SDXL-512",
    use_safetensors=True,
).to('cuda')

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

prompt = "a woman laughing"
negative_prompt = ""

image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    width=512,
    height=512,
    target_size=(1024, 1024),
    original_size=(4096, 4096),
    num_inference_steps=50
).images[0]

image.save("woman_laughing.png")
```

<hr>

# Limitations and Bias
## Limitations
- The model does not achieve perfect photorealism
- The model cannot render legible text
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
- Faces and people in general may not be generated properly.

## Bias
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.