File size: 3,032 Bytes
8d8a8af 3aa9e0a 8d8a8af 3aa9e0a c86bde1 3aa9e0a 83dd889 c86bde1 83dd889 3aa9e0a 83dd889 95b7c0d 83dd889 62b1068 83dd889 62b1068 83dd889 62b1068 83dd889 62b1068 83dd889 62b1068 83dd889 62b1068 83dd889 62b1068 83dd889 3aa9e0a 83dd889 3aa9e0a 83dd889 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: openrail++
tags:
- text-to-image
- stable-diffusion
---
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/FAHjxgN2tk6uXmQAUeFI5.jpeg)
<hr>
# Overview
SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to more simply generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution.
*Note:* It bears repeating that SDXL-512 was not trained to be "better" than SDXL, but rather to simplify prompting for higher-fidelity outputs at and around the 512x512 resolution.
- **Use it with [Hotshot-XL](https://huggingface.co/hotshotco/Hotshot-XL) (recommended)**
<hr>
# Model Description
- **Developed by**: Natural Synthetics Inc.
- **Model type**: Diffusion-based text-to-image generative model
- **License**: CreativeML Open RAIL++-M License
- **Model Description**: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution.
- **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/Hotshot-XL).
- **Finetuned from model**: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
<hr>
# 🧨 Diffusers
Make sure to upgrade diffusers to >= 0.18.2:
```
pip install diffusers --upgrade
```
In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
```
pip install invisible_watermark transformers accelerate safetensors
```
Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**:
```py
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
pipe = StableDiffusionXLPipeline.from_pretrained(
"hotshotco/SDXL-512",
use_safetensors=True,
).to('cuda')
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
prompt = "a woman laughing"
negative_prompt = ""
image = pipe(
prompt,
negative_prompt=negative_prompt,
width=512,
height=512,
target_size=(1024, 1024),
original_size=(4096, 4096),
num_inference_steps=50
).images[0]
image.save("woman_laughing.png")
```
<hr>
# Limitations and Bias
## Limitations
- The model does not achieve perfect photorealism
- The model cannot render legible text
- The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
- Faces and people in general may not be generated properly.
## Bias
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
|