stabilityai/stable-diffusion · Second and third tries on a text description to see if it draws it well

Oct 14, 2022

Text: "a lovely and lively elegant fantasy woman weighing 110 kilograms and standing 5 feet 9 inches tall, wearing bluejean overalls and barefoot, stands on the sidewalk in front of her house, and bends over to pick up the newspaper. Her long, red, shiny hair falls to the ground, and two boys watch her from across the street. The sky is mostly clear and blue, with a plane flying overhead.", perfect-face, painted with flair in an abstract fashion, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

Text: "a lovely and lively elegant fantasy man, wearing purple sweatpants, a green sweater, and enormous clown-shoes, sits on the hood of a car that is parked in front of the United States Capital building. He holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

OxxyJoe

Oct 14, 2022

•

edited Oct 14, 2022

(Very slight change... removed the word "fantasy" from the man's description and replaced "car" with "dark-blue BMW")
Text: "a lovely and lively elegant man, wearing purple sweatpants, a green sweater, and enormous clown-shoes, sits on the hood of a dark-blue BMW that is parked in front of the United States Capital building. He holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

OxxyJoe

Oct 14, 2022

(Changed his shoes from clownshoes to black reeboks. this drastically changed pretty much everything, strangely. It really thinks it knows things about someone based on their shoes)

"a lovely and lively elegant man, wearing purple sweatpants, a green sweater, and black Reeboks, sits on the hood of a dark-blue BMW that is parked in front of the United States Capital building. He holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

OxxyJoe

Oct 14, 2022

this time, I change the shoes again to dockers, and call him "bombastic" rather than "lovely" (why not). But more importantly, made the two sentences into one run-on sentence. I feel like the second sentence about the bug net and dollar bills has been almost ignored every time, and so, I made one (even longer) run-on sentence instead. However, this seems to have not improved anything at all. I notice that because I mentioned the colors green, purple, and black in the description, these colors are used for everything.

Text: "bombastic and lively elegant man, wearing purple sweatpants, a green sweater, and black dockers, sits on the hood of a dark-blue BMW that is parked in front of the United States Capital building, and holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

OxxyJoe

Oct 14, 2022

changed bug-net to net, seeing as how it draws insects. It clearly does not really Read the description like a sentence, exactly, yet it does seem to get some of it correct.

OxxyJoe

Oct 14, 2022

overhauled the sentence with brackets, to see if it aids the a.i.'s comprehension.
surprised that it still gets the color of the car wrong so often. Changed it to blue but that didn't do much.

Text: "man who is [ bombastic and lively elegant ] and [ wearing [ purple sweatpants] and [ green sweater ] and [ black dockers ] ], sits on [ [ the hood of ] a blue BMW parked [ [ in front of ] the [ United States Capital building ] and [ holds [ [in the air ] [ an enormous ] net ] and catches [ dollars [ that are flying around ] ].", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

Lightningstudiboo66

Oct 14, 2022

Wow, I love your method!

OxxyJoe

Oct 14, 2022

Wow, I love your method!

Thank you :)

Moriad

Oct 15, 2022

The AI has a very limited "memory". It can mash together many general and stylistic terms, but when it comes to simple facts about the things to be depicted, the more you add, the more confused the generated image becomes.

It can do "a cat", "a black cat", and "a black cat sitting in a box" consistently, as long as you're not too picky about anatomical accuracy. It can even do "a black cat sitting in a white box on a lawn" 1/4 of the time. "A black cat with green eyes sitting in a white box on a lawn" will work about as often. But the moment you add another detail, it takes a lot more tries. So "a black cat with green eyes wearing a collar sitting in a white box on a lawn" can still be done, but it's down to less than 7%. And as for "a black cat with green eyes wearing a red collar sitting in a white box on a lawn", it might be faster to get a cat, collar, and box, and photograph the thing itself. 😆

Looks like 8 or 9 precise concrete facts might be the limit for a reasonable amount of time, unless you just happen to get lucky. Or you've asked for something that appears in millions of images (e.g. "A wide angle close-up selfie taken from above of an adult female with pouty lips, wearing lingerie, in a cluttered bathroom with bad lighting," which I'm guessing you might get on the first try, even if you didn't want it. 😉)

Cat 2. Black cat 3. Black cat with green eyes 4. Black cat with green eyes sitting 5. Black cat with green eyes sitting in a box 6. Black cat with green eyes sitting in a white box 7. Black cat with green eyes sitting in a white box on a lawn 8. Black cat with green eyes wearing a collar sitting in a white box on a lawn

OxxyJoe

Oct 15, 2022

•

edited Oct 15, 2022

The AI has a very limited "memory". It can mash together many general and stylistic terms, but ...
... which I'm guessing you might get on the first try, even if you didn't want it. 😉)

ok so just for fun:
so just for fun I tried this description you gave, Lol.
text: "A wide angle close-up selfie taken from above of an adult female with pouty lips, wearing lingerie, in a cluttered bathroom with bad lighting,"

then I modified it just to see if trying to find and eliminate unnecessary words helps. I also added "lively and elegant," idk, to make her look a bit more ... lively and .. elegant..

text: wide-angle, close-up selfie, from above, lively and elegant woman, pouty lips, wearing lingerie, in a cluttered bathroom, bad lighting

Lightningstudiboo66

Oct 15, 2022

Lol