hotshotco
/

SDXL-512

@@ -7,13 +7,66 @@ tags:
 ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/ux_sZKB9snVPsKRT1TzfG.gif)
 # Model Description
 - **Developed by**: Natural Synthetics Inc.
 - **Model type**: Diffusion-based text-to-image generative model
 - **License**: CreativeML Open RAIL++-M License
-- **Model Description**: This is a model that can be used to generate and modify images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
 - **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/hotshot-xl).
 # Limitations and Bias
 ## Limitations
@@ -23,4 +76,4 @@ tags:
 - Faces and people in general may not be generated properly.
 ## Bias
-While the capabilities of video generation models are impressive, they can also reinforce or exacerbate social biases.

 ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/637a6daf7ce76c3b83497ea2/ux_sZKB9snVPsKRT1TzfG.gif)
+<hr>
+# Overview
+SDXL-512 is a checkpoint fine-tuned from SDXL 1.0 that is designed to generate higher-fidelity images at and around the 512x512 resolution. The model has been fine-tuned using a learning rate of 1e-6 over 7000 steps with a batch size of 64 on a curated dataset of multiple aspect ratios. alternating low and high resolution batches (per aspect ratio) so as not to impair the base model's existing performance at higher resolution.
+- **Use it with [Hotshot-XL](https://huggingface.co/hotshotco/Hotshot-XL) (recommended)**
+<hr>
 # Model Description
 - **Developed by**: Natural Synthetics Inc.
 - **Model type**: Diffusion-based text-to-image generative model
 - **License**: CreativeML Open RAIL++-M License
+- **Model Description**: This is a model that can be used to generate and modify higher-fidelity images at and around the 512x512 resolution.
 - **Resources for more information**: Check out our [GitHub Repository](https://github.com/hotshotco/hotshot-xl).
+- **Finetuned from model**: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
+<hr>
+# 🧨 Diffusers
+Make sure to upgrade diffusers to >= 0.18.2:
+```
+pip install diffusers --upgrade
+```
+In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
+```
+pip install invisible_watermark transformers accelerate safetensors
+```
+Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**:
+```py
+import torch
+from torch import autocast
+from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
+model = "hotshotco/SDXL-512"
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    model,
+    torch_dtype=torch.float16,
+    use_safetensors=True,
+    variant="fp16"
+    )
+pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
+pipe.to('cuda')
+prompt = "a woman laughing"
+negative_prompt = ""
+image = pipe(
+    prompt,
+    negative_prompt=negative_prompt,
+    width=512,
+    height=512,
+    guidance_scale=12,
+    target_size=(1024,1024),
+    original_size=(4096,4096),
+    num_inference_steps=50
+    ).images[0]
+image.save("woman_laughing.png")
+```
+<hr>
 # Limitations and Bias
 ## Limitations
 - Faces and people in general may not be generated properly.
 ## Bias
+While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.