Spaces:

adamelliotfields
/

diffusion

Running on Zero

File size: 7,043 Bytes

98afd85
ba33983
f24703f
ba33983
98afd85
ba33983
edead93
 
 
 
 
 
f70898c
edead93
 
 
 
 
 
 
ba33983
98afd85
ba33983
f70898c
edead93
f70898c
 
98afd85
f70898c
 
 
 
 
 
 
 
 
 
 
 
98afd85
f70898c
 
 
 
 
 
 
ba33983
98afd85
1a688bc
6360e64
ba33983
7e65847
6360e64
f70898c
 
 
c348e53
98afd85
c348e53
6360e64
edead93
f70898c
c348e53
98afd85
7e65847
 
 
 
 
 
 
 
 
98afd85
579e8d0
6360e64
 
98afd85
60849d7
98afd85
 
 
61ad3d2
 
 
 
 
9edebae
60849d7
9edebae
60849d7
98afd85
 
 
 
 
 
 
 
 
 
 
ba33983
98afd85
ba33983
9edebae
7b8e908
 
ba33983
 
 
98afd85
23f4f95
9edebae
23f4f95
98afd85
23f4f95
7b8e908
23f4f95
98afd85
ba33983
61ad3d2

# Diffusion ZERO

TL;DR: Enter a prompt or roll the `🎲` and press `Generate`.

## Prompting

Positive and negative prompts are embedded by [Compel](https://github.com/damian0815/compel) for weighting. See [syntax features](https://github.com/damian0815/compel/blob/main/doc/syntax.md) to learn more.

Use `+` or `-` to increase the weight of a token. The weight grows exponentially when chained. For example, `blue+` means 1.1x more attention is given to `blue`, while `blue++` means 1.1^2 more, and so on. The same applies to `-`.

For groups of tokens, wrap them in parentheses and multiply by a float between 0 and 2. For example, `a (birthday cake)1.3 on a table` will increase the weight of both `birthday` and `cake` by 1.3x. This also means the entire scene will be more birthday-like, not just the cake. To counteract this, you can use `-` inside the parentheses on specific tokens, e.g., `a (birthday-- cake)1.3`, to reduce the birthday aspect.

This is the same syntax used in [InvokeAI](https://invoke-ai.github.io/InvokeAI/features/PROMPTS/) and it differs from AUTOMATIC1111:

| Compel      | AUTOMATIC1111 |
| ----------- | ------------- |
| `blue++`    | `((blue))`    |
| `blue--`    | `[[blue]]`    |
| `(blue)1.2` | `(blue:1.2)`  |
| `(blue)0.8` | `(blue:0.8)`  |

### Arrays

Arrays allow you to generate multiple different images from a single prompt. For example, `an adult [[blonde,brunette]] [[man,woman]]` will expand into **4** different prompts. This implementation was inspired by [Fooocus](https://github.com/lllyasviel/Fooocus/pull/1503).

> NB: Make sure to set `Images` to the number of images you want to generate. Otherwise, only the first prompt will be used.

## Models

Each model checkpoint has a different aesthetic:

* [Comfy-Org/stable-diffusion-v1-5](https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive): base
* [cyberdelia/CyberRealistic_V5](https://huggingface.co/cyberdelia/CyberRealistic): realistic
* [Lykon/dreamshaper-8](https://huggingface.co/Lykon/dreamshaper-8): general purpose (default)
* [fluently/Fluently-v4](https://huggingface.co/fluently/Fluently-v4): general purpose stylized
* [Linaqruf/anything-v3-1](https://huggingface.co/Linaqruf/anything-v3-1): anime
* [prompthero/openjourney-v4](https://huggingface.co/prompthero/openjourney-v4): Midjourney art style
* [SG161222/Realistic_Vision_V5](https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE): realistic
* [XpucT/Deliberate_v6](https://huggingface.co/XpucT/Deliberate): general purpose stylized

## LoRA

Apply up to 2 LoRA (low-rank adaptation) adapters with adjustable strength:

* [Perfection Style](https://civitai.com/models/411088?modelVersionId=486099): attempts to improve aesthetics, use high strength
* [Detailed Style](https://civitai.com/models/421162?modelVersionId=486110): attempts to improve details, use low strength

> NB: The trigger words are automatically appended to the positive prompt for you.

## Embeddings

Select one or more [textual inversion](https://huggingface.co/docs/diffusers/en/using-diffusers/textual_inversion_inference) embeddings:

* [`fast_negative`](https://civitai.com/models/71961?modelVersionId=94057): all-purpose (default, **recommended**)
* [`cyberrealistic_negative`](https://civitai.com/models/77976?modelVersionId=82745): realistic add-on (for CyberRealistic)
* [`unrealistic_dream`](https://civitai.com/models/72437?modelVersionId=77173): realistic add-on (for RealisticVision)

> NB: The trigger token is automatically appended to the negative prompt for you.

## Styles

[Styles](https://huggingface.co/spaces/adamelliotfields/diffusion/blob/main/data/styles.json) are prompt templates that wrap your positive and negative prompts. They were originally derived from the [twri/sdxl_prompt_styler](https://github.com/twri/sdxl_prompt_styler) Comfy node, but have since been entirely rewritten.

Start by framing a simple subject like `portrait of a young adult woman` or `landscape of a mountain range` and experiment.

### Anime

The `Anime: *` styles work the best with Dreamshaper. When using the anime-specific Anything model, you should use the `Anime: Anything` style with the following settings:

* Scheduler: `DEIS 2M` or `DPM++ 2M`
* Guidance: `10`
* Steps: `50`

You subject should be a few simple tokens like `girl, brunette, blue eyes, armor, nebula, celestial`. Experiment with `Clip Skip` and `Karras`. Finish with the `Perfection Style` LoRA on a moderate setting and upscale. 

## Scale

Rescale up to 4x using [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) with weights from [ai-forever](ai-forever/Real-ESRGAN). Necessary for high-resolution images.

## Image-to-Image

The `🖼️ Image` tab enables the image-to-image and IP-Adapter pipelines.

### Strength

Denoising strength is essentially how much the generation will differ from the input image. A value of `0` will be identical to the original, while `1` will be a completely new image. You may want to also increase the number of inference steps. Only applies to the image-to-image input.

### IP-Adapter

In an image-to-image pipeline, the input image is used as the initial latent. With [IP-Adapter](https://github.com/tencent-ailab/IP-Adapter), the input image is processed by a separate image encoder and the encoded features are used as conditioning along with the text prompt.

For capturing faces, enable `IP-Adapter Face` to use the full-face model. You should use an input image that is mostly a face and it should be high quality. You can generate fake portraits with Realistic Vision to experiment. Note that you'll never get true identity preservation without an advanced pipeline like [InstantID](https://github.com/instantX-research/InstantID), which combines many techniques.

## ControlNet

The `🎮 Control` tab enables the [ControlNet](https://github.com/lllyasviel/ControlNet) pipelines. Read the [Diffusers docs](https://huggingface.co/docs/diffusers/using-diffusers/controlnet) to learn more.

### Annotators

In ControlNet, the input image is a feature map produced by an _annotator_. These are computer vision models used for tasks like edge detection and pose estimation. ControlNet models are trained to understand these feature maps.

> NB: Control images will be automatically resized to the nearest multiple of 64 (e.g., 513 -> 512).

## Advanced

### DeepCache

[DeepCache](https://github.com/horseee/DeepCache) caches lower UNet layers and reuses them every `Interval` steps. Trade quality for speed:
* `1`: no caching (default)
* `2`: more quality
* `3`: balanced
* `4`: more speed

### FreeU

[FreeU](https://github.com/ChenyangSi/FreeU) re-weights the contributions sourced from the UNet’s skip connections and backbone feature maps. Can sometimes improve image quality.

### Clip Skip

When enabled, the last CLIP layer is skipped. Can sometimes improve image quality.

### Tiny VAE

Enable [madebyollin/taesd](https://github.com/madebyollin/taesd) for near-instant latent decoding with a minor loss in detail. Useful for development.