toilaluan commited on
Commit
a17b715
1 Parent(s): 4af19c9

Trained for 1 epochs and 1500 steps.

Browse files

Trained with datasets ['text-embeds', 'mj-v6']
Learning rate 8e-06, batch size 32, and 4 gradient accumulation steps.
Used DDPM noise scheduler for training with epsilon prediction type and rescaled_betas_zero_snr=False
Using 'trailing' timestep spacing.
Base model: PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
VAE: madebyollin/sdxl-vae-fp16-fix

README.md CHANGED
@@ -15,11 +15,166 @@ widget:
15
  negative_prompt: 'blurry, cropped, ugly'
16
  output:
17
  url: ./assets/image_0_0.png
18
- - text: 'ethnographic photography of teddy bear at a picnic'
19
  parameters:
20
  negative_prompt: 'blurry, cropped, ugly'
21
  output:
22
  url: ./assets/image_1_0.png
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
 
25
  # pixart-training
@@ -28,11 +183,11 @@ This is a full rank finetune derived from [PixArt-alpha/PixArt-Sigma-XL-2-1024-M
28
 
29
 
30
 
31
- The main validation prompt used during training was:
 
 
 
32
 
33
- ```
34
- ethnographic photography of teddy bear at a picnic
35
- ```
36
 
37
  ## Validation settings
38
  - CFG: `7.5`
@@ -55,12 +210,12 @@ You may reuse the base model text encoder for inference.
55
 
56
  ## Training settings
57
 
58
- - Training epochs: 0
59
- - Training steps: 1000
60
  - Learning rate: 8e-06
61
- - Effective batch size: 96
62
  - Micro-batch size: 32
63
- - Gradient accumulation steps: 3
64
  - Number of GPUs: 1
65
  - Prediction type: epsilon
66
  - Rescaled betas zero SNR: False
@@ -73,7 +228,7 @@ You may reuse the base model text encoder for inference.
73
 
74
  ### mj-v6
75
  - Repeats: 0
76
- - Total number of images: 199872
77
  - Total number of aspect buckets: 1
78
  - Resolution: 1.0 megapixels
79
  - Cropped: False
@@ -91,7 +246,7 @@ from diffusers import DiffusionPipeline
91
 
92
 
93
  model_id = "pixart-training"
94
- prompt = "ethnographic photography of teddy bear at a picnic"
95
  negative_prompt = "malformed, disgusting, overexposed, washed-out"
96
 
97
  pipeline = DiffusionPipeline.from_pretrained(model_id)
 
15
  negative_prompt: 'blurry, cropped, ugly'
16
  output:
17
  url: ./assets/image_0_0.png
18
+ - text: 'a woman sitting on the grass'
19
  parameters:
20
  negative_prompt: 'blurry, cropped, ugly'
21
  output:
22
  url: ./assets/image_1_0.png
23
+ - text: 'a professional photo headshot of a man in studio lighting'
24
+ parameters:
25
+ negative_prompt: 'blurry, cropped, ugly'
26
+ output:
27
+ url: ./assets/image_2_0.png
28
+ - text: 'a person holding a sign that reads ''SOON'''
29
+ parameters:
30
+ negative_prompt: 'blurry, cropped, ugly'
31
+ output:
32
+ url: ./assets/image_3_0.png
33
+ - text: 'Alien marketplace, bizarre creatures, exotic goods, vibrant colors, otherworldly atmosphere'
34
+ parameters:
35
+ negative_prompt: 'blurry, cropped, ugly'
36
+ output:
37
+ url: ./assets/image_4_0.png
38
+ - text: 'Child holding a balloon, happy expression, colorful balloons, sunny day, high detail'
39
+ parameters:
40
+ negative_prompt: 'blurry, cropped, ugly'
41
+ output:
42
+ url: ./assets/image_5_0.png
43
+ - text: 'a 4-panel comic strip showing an orange cat saying the words ''HELP'' and ''LASAGNA'''
44
+ parameters:
45
+ negative_prompt: 'blurry, cropped, ugly'
46
+ output:
47
+ url: ./assets/image_6_0.png
48
+ - text: 'a hand is holding a comic book with a cover that reads ''The Adventures of Superhero'''
49
+ parameters:
50
+ negative_prompt: 'blurry, cropped, ugly'
51
+ output:
52
+ url: ./assets/image_7_0.png
53
+ - text: 'Underground cave filled with crystals, glowing lights, reflective surfaces, fantasy environment, high detail'
54
+ parameters:
55
+ negative_prompt: 'blurry, cropped, ugly'
56
+ output:
57
+ url: ./assets/image_8_0.png
58
+ - text: 'Bustling cyberpunk bazaar, vendors, neon signs, advanced tech, crowded, high detail'
59
+ parameters:
60
+ negative_prompt: 'blurry, cropped, ugly'
61
+ output:
62
+ url: ./assets/image_9_0.png
63
+ - text: 'Cyberpunk hacker in a dark room, neon glow, multiple screens, intense focus, high detail'
64
+ parameters:
65
+ negative_prompt: 'blurry, cropped, ugly'
66
+ output:
67
+ url: ./assets/image_10_0.png
68
+ - text: 'a cybernetic anne of green gables with neural implant and bio mech augmentations'
69
+ parameters:
70
+ negative_prompt: 'blurry, cropped, ugly'
71
+ output:
72
+ url: ./assets/image_11_0.png
73
+ - text: 'Post-apocalyptic cityscape, ruined buildings, overgrown vegetation, dark and gritty, high detail'
74
+ parameters:
75
+ negative_prompt: 'blurry, cropped, ugly'
76
+ output:
77
+ url: ./assets/image_12_0.png
78
+ - text: 'Magical castle in a lush forest, glowing windows, fantasy architecture, high resolution, detailed textures'
79
+ parameters:
80
+ negative_prompt: 'blurry, cropped, ugly'
81
+ output:
82
+ url: ./assets/image_13_0.png
83
+ - text: 'Ruins of an ancient temple in an enchanted forest, glowing runes, mystical creatures, high detail'
84
+ parameters:
85
+ negative_prompt: 'blurry, cropped, ugly'
86
+ output:
87
+ url: ./assets/image_14_0.png
88
+ - text: 'Mystical forest, glowing plants, fairies, magical creatures, fantasy art, high detail'
89
+ parameters:
90
+ negative_prompt: 'blurry, cropped, ugly'
91
+ output:
92
+ url: ./assets/image_15_0.png
93
+ - text: 'Magical garden with glowing flowers, fairies, serene atmosphere, detailed plants, high resolution'
94
+ parameters:
95
+ negative_prompt: 'blurry, cropped, ugly'
96
+ output:
97
+ url: ./assets/image_16_0.png
98
+ - text: 'Whimsical garden filled with fairies, magical plants, sparkling lights, serene atmosphere, high detail'
99
+ parameters:
100
+ negative_prompt: 'blurry, cropped, ugly'
101
+ output:
102
+ url: ./assets/image_17_0.png
103
+ - text: 'Majestic dragon soaring through the sky, detailed scales, dynamic pose, fantasy art, high resolution'
104
+ parameters:
105
+ negative_prompt: 'blurry, cropped, ugly'
106
+ output:
107
+ url: ./assets/image_18_0.png
108
+ - text: 'Fantasy world, floating islands in the sky, waterfalls, lush vegetation, detailed landscape, high resolution'
109
+ parameters:
110
+ negative_prompt: 'blurry, cropped, ugly'
111
+ output:
112
+ url: ./assets/image_19_0.png
113
+ - text: 'Futuristic city skyline at night, neon lights, cyberpunk style, high contrast, sharp focus'
114
+ parameters:
115
+ negative_prompt: 'blurry, cropped, ugly'
116
+ output:
117
+ url: ./assets/image_20_0.png
118
+ - text: 'Space battle scene, starships fighting, laser beams, explosions, cosmic background'
119
+ parameters:
120
+ negative_prompt: 'blurry, cropped, ugly'
121
+ output:
122
+ url: ./assets/image_21_0.png
123
+ - text: 'Abandoned fairground at night, eerie rides, ghostly figures, fog, dark atmosphere, high detail'
124
+ parameters:
125
+ negative_prompt: 'blurry, cropped, ugly'
126
+ output:
127
+ url: ./assets/image_22_0.png
128
+ - text: 'Spooky haunted mansion on a hill, dark and eerie, glowing windows, ghostly atmosphere, high detail'
129
+ parameters:
130
+ negative_prompt: 'blurry, cropped, ugly'
131
+ output:
132
+ url: ./assets/image_23_0.png
133
+ - text: 'a hardcover physics textbook that is called PHYSICS FOR DUMMIES'
134
+ parameters:
135
+ negative_prompt: 'blurry, cropped, ugly'
136
+ output:
137
+ url: ./assets/image_24_0.png
138
+ - text: 'Epic medieval battle, knights in armor, dynamic action, detailed landscape, high resolution'
139
+ parameters:
140
+ negative_prompt: 'blurry, cropped, ugly'
141
+ output:
142
+ url: ./assets/image_25_0.png
143
+ - text: 'Bustling medieval market with merchants, knights, and jesters, vibrant colors, detailed'
144
+ parameters:
145
+ negative_prompt: 'blurry, cropped, ugly'
146
+ output:
147
+ url: ./assets/image_26_0.png
148
+ - text: 'Cozy medieval tavern, warm firelight, adventurers drinking, detailed interior, rustic atmosphere'
149
+ parameters:
150
+ negative_prompt: 'blurry, cropped, ugly'
151
+ output:
152
+ url: ./assets/image_27_0.png
153
+ - text: 'Futuristic city skyline at night, neon lights, cyberpunk style, high contrast, sharp focus'
154
+ parameters:
155
+ negative_prompt: 'blurry, cropped, ugly'
156
+ output:
157
+ url: ./assets/image_28_0.png
158
+ - text: 'Forest with neon-lit trees, glowing plants, bioluminescence, surreal atmosphere, high detail'
159
+ parameters:
160
+ negative_prompt: 'blurry, cropped, ugly'
161
+ output:
162
+ url: ./assets/image_29_0.png
163
+ - text: 'Bright neon sign in a busy city street, ''Open 24 Hours'', bold typography, glowing lights'
164
+ parameters:
165
+ negative_prompt: 'blurry, cropped, ugly'
166
+ output:
167
+ url: ./assets/image_30_0.png
168
+ - text: 'Retro diner sign, ''Joe''s Diner'', classic 1950s design, neon lights, weathered look'
169
+ parameters:
170
+ negative_prompt: 'blurry, cropped, ugly'
171
+ output:
172
+ url: ./assets/image_31_0.png
173
+ - text: 'Vintage store sign with elaborate typography, ''Antique Shop'', hand-painted, weathered look'
174
+ parameters:
175
+ negative_prompt: 'blurry, cropped, ugly'
176
+ output:
177
+ url: ./assets/image_32_0.png
178
  ---
179
 
180
  # pixart-training
 
183
 
184
 
185
 
186
+ No validation prompt was used during training.
187
+
188
+
189
+ None
190
 
 
 
 
191
 
192
  ## Validation settings
193
  - CFG: `7.5`
 
210
 
211
  ## Training settings
212
 
213
+ - Training epochs: 1
214
+ - Training steps: 1500
215
  - Learning rate: 8e-06
216
+ - Effective batch size: 128
217
  - Micro-batch size: 32
218
+ - Gradient accumulation steps: 4
219
  - Number of GPUs: 1
220
  - Prediction type: epsilon
221
  - Rescaled betas zero SNR: False
 
228
 
229
  ### mj-v6
230
  - Repeats: 0
231
+ - Total number of images: 134144
232
  - Total number of aspect buckets: 1
233
  - Resolution: 1.0 megapixels
234
  - Cropped: False
 
246
 
247
 
248
  model_id = "pixart-training"
249
+ prompt = "An astronaut is riding a horse through the jungles of Thailand."
250
  negative_prompt = "malformed, disgusting, overexposed, washed-out"
251
 
252
  pipeline = DiffusionPipeline.from_pretrained(model_id)
optimizer.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1597106775343a8153b308b4b7427f3f5469d1ae70ad04c6b7f6723dac36c0ac
3
  size 3665677155
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b59ab5187438495f70ae65205ab85b3a89c32121446239c4451ad44ba69f8ac9
3
  size 3665677155
random_states_0.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f3b1a5f95dcc31f7a44acf2784e0d3aa219fd4a333ac008412483e4063a5218
3
  size 14344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0095adc498355bd439cad9421b34a6d1c1d09f12fa9ef8fd47d6aa43a9d63704
3
  size 14344
scheduler.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a778f242057a9963f512d7713fc392bdaa1ea385692a6d218868f2e42ea21ee4
3
  size 1000
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e9401b92a0437e861b78763e41ede39d20dc2bdd763568f11b5463d73323d27
3
  size 1000
training_state-mj-v6.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:974645e8da8c2abdafe9545abe09d332633eaabd17909676477a3d68faa60c51
3
- size 23062435
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f29ad928d40ef0bf3d488cdb9aaba3bcfb1d1cd94e268087cdeea897542a6a7
3
+ size 12312675
training_state.json CHANGED
@@ -1 +1 @@
1
- {"global_step": 1000, "epoch_step": 1000, "epoch": 1, "exhausted_backends": [], "repeats": {}}
 
1
+ {"global_step": 1500, "epoch_step": 201, "epoch": 2, "exhausted_backends": [], "repeats": {"mj-v6": 0}}
transformer/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "PixArtTransformer2DModel",
3
  "_diffusers_version": "0.29.0",
4
- "_name_or_path": "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS",
5
  "activation_fn": "gelu-approximate",
6
  "attention_bias": true,
7
  "attention_head_dim": 72,
 
1
  {
2
  "_class_name": "PixArtTransformer2DModel",
3
  "_diffusers_version": "0.29.0",
4
+ "_name_or_path": "/home/ubuntu/code/MJ/mjv6/models/checkpoint-1000",
5
  "activation_fn": "gelu-approximate",
6
  "attention_bias": true,
7
  "attention_head_dim": 72,
transformer/diffusion_pytorch_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0404e972345ceaa2b6cdcd3c8022a93e9a66475833776d91fabd3e3dab0a5fdf
3
  size 1221780352
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7bfe5b2c7ab5cc1d27949b595d1844abedaa050fec5fc4a7ac68124c21ca0af
3
  size 1221780352