bghira
/

pseudo-real

StableDiffusionPipeline

computer vision

stable-diffusion

stable-diffusion-2-1

Inference Endpoints

Model card Files Files and versions Community

PseudoTerminal X commited on Jun 26, 2023

Commit

b8fe228

•

1 Parent(s): 8847ac0

Add a model card

Files changed (1) hide show

README.md +56 -0

README.md CHANGED Viewed

@@ -1,3 +1,59 @@
 ---
 license: creativeml-openrail-m
 ---

 ---
 license: creativeml-openrail-m
+tags:
+- computer vision
+- stable-diffusion
+- stable-diffusion-2-1
+- photography
+- photoreal
 ---
+# Capabilities
+This model is capable of producing photorealistic images of people.
+It retains much of the base 2.1-v model knowledge, as its text encoder is minimally tuned.
+# Limitations
+This model does not produce perfect results every time.
+This model cannot reproduce most real people. Instead, it makes "Derp-a-Like" equivalents to real people, which I prefer.
+This model is not great at abstract imagery or digital art, though it certainly can produce a variety of amazing art styles.
+# Dataset
+* cushman (8000 kodachrome slides from 1939 to 1969)
+* midjourney v5.1-filtered (about 22,000 upscaled v5.1 images)
+* national geographic (about 3-4,000 >1024x768 images of animals, wildlife, landscapes, history)
+* a small dataset of stock images of people vaping / smoking
+# Training parameters
+* polynomial learning rate scheduler shared between TE and Unet starting at 4e-8 and decaying to 1e-8
+* batch size 15, gradient accumulations 10 => effective BS=150
+* target is 30,000 steps but will likely stop sooner
+* terminal SNR enforced betas
+# Training goals
+* explore the effects of terminal SNR scheduling
+* improve faces, especially "at a distance"
+* improve composition, eg. completeness of resulting image
+* improve prompt comprehension, eg. "do what i want, even if it is weird"
+* retain / introduce a slightly colourful flavour due to the midjourney data
+* enhance understanding of the past, through the Cushman collection
+* retain the ability to produce natural landscapes and animals via National Geographic
+# Observations
+* at 1650 steps, we still haven't cracked the code on faces.
+* at 250 steps, we had amazing photoreal Mars landscapes that have carried forward mostly to 1650 steps
+* lighting and composition are at their best
+# Future work
+This model inspired the search for a solution to the proliferation issue that led me to ttj/flex-diffusion-2-1, which led to the creation of ptx0/pseudo-flex-base, another photoreal model with multiple aspect support.
+This model was trained **purely** on 768x768 square images, which were randomly resized and cropped. It can produce some higher resolution landscapes, but it cannot reliably do higher resolution subjects without deformities.