PseudoTerminal X commited on
Commit
b8fe228
1 Parent(s): 8847ac0

Add a model card

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
2
  license: creativeml-openrail-m
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: creativeml-openrail-m
3
+ tags:
4
+ - computer vision
5
+ - stable-diffusion
6
+ - stable-diffusion-2-1
7
+ - photography
8
+ - photoreal
9
  ---
10
+
11
+ # Capabilities
12
+
13
+ This model is capable of producing photorealistic images of people.
14
+
15
+ It retains much of the base 2.1-v model knowledge, as its text encoder is minimally tuned.
16
+
17
+ # Limitations
18
+
19
+ This model does not produce perfect results every time.
20
+
21
+ This model cannot reproduce most real people. Instead, it makes "Derp-a-Like" equivalents to real people, which I prefer.
22
+
23
+ This model is not great at abstract imagery or digital art, though it certainly can produce a variety of amazing art styles.
24
+
25
+ # Dataset
26
+
27
+ * cushman (8000 kodachrome slides from 1939 to 1969)
28
+ * midjourney v5.1-filtered (about 22,000 upscaled v5.1 images)
29
+ * national geographic (about 3-4,000 >1024x768 images of animals, wildlife, landscapes, history)
30
+ * a small dataset of stock images of people vaping / smoking
31
+
32
+ # Training parameters
33
+
34
+ * polynomial learning rate scheduler shared between TE and Unet starting at 4e-8 and decaying to 1e-8
35
+ * batch size 15, gradient accumulations 10 => effective BS=150
36
+ * target is 30,000 steps but will likely stop sooner
37
+ * terminal SNR enforced betas
38
+
39
+ # Training goals
40
+
41
+ * explore the effects of terminal SNR scheduling
42
+ * improve faces, especially "at a distance"
43
+ * improve composition, eg. completeness of resulting image
44
+ * improve prompt comprehension, eg. "do what i want, even if it is weird"
45
+ * retain / introduce a slightly colourful flavour due to the midjourney data
46
+ * enhance understanding of the past, through the Cushman collection
47
+ * retain the ability to produce natural landscapes and animals via National Geographic
48
+
49
+ # Observations
50
+
51
+ * at 1650 steps, we still haven't cracked the code on faces.
52
+ * at 250 steps, we had amazing photoreal Mars landscapes that have carried forward mostly to 1650 steps
53
+ * lighting and composition are at their best
54
+
55
+ # Future work
56
+
57
+ This model inspired the search for a solution to the proliferation issue that led me to ttj/flex-diffusion-2-1, which led to the creation of ptx0/pseudo-flex-base, another photoreal model with multiple aspect support.
58
+
59
+ This model was trained **purely** on 768x768 square images, which were randomly resized and cropped. It can produce some higher resolution landscapes, but it cannot reliably do higher resolution subjects without deformities.