Second and third tries on a text description to see if it draws it well

#2912
by OxxyJoe - opened

Text: "a lovely and lively elegant fantasy woman weighing 110 kilograms and standing 5 feet 9 inches tall, wearing bluejean overalls and barefoot, stands on the sidewalk in front of her house, and bends over to pick up the newspaper. Her long, red, shiny hair falls to the ground, and two boys watch her from across the street. The sky is mostly clear and blue, with a plane flying overhead.", perfect-face, painted with flair in an abstract fashion, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

11.jpg
12.jpg
13.jpg
14.jpg

Text: "a lovely and lively elegant fantasy man, wearing purple sweatpants, a green sweater, and enormous clown-shoes, sits on the hood of a car that is parked in front of the United States Capital building. He holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

15.jpg

16.jpg

17.jpg

18.jpg

19.jpg

20.jpg

21.jpg

22.jpg

23.jpg

24.jpg

25.jpg

(Very slight change... removed the word "fantasy" from the man's description and replaced "car" with "dark-blue BMW")
Text: "a lovely and lively elegant man, wearing purple sweatpants, a green sweater, and enormous clown-shoes, sits on the hood of a dark-blue BMW that is parked in front of the United States Capital building. He holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

26.jpg

27.jpg

28.jpg

29.jpg

(Changed his shoes from clownshoes to black reeboks. this drastically changed pretty much everything, strangely. It really thinks it knows things about someone based on their shoes)

"a lovely and lively elegant man, wearing purple sweatpants, a green sweater, and black Reeboks, sits on the hood of a dark-blue BMW that is parked in front of the United States Capital building. He holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

30.jpg

31.jpg

32.jpg

33.jpg

this time, I change the shoes again to dockers, and call him "bombastic" rather than "lovely" (why not). But more importantly, made the two sentences into one run-on sentence. I feel like the second sentence about the bug net and dollar bills has been almost ignored every time, and so, I made one (even longer) run-on sentence instead. However, this seems to have not improved anything at all. I notice that because I mentioned the colors green, purple, and black in the description, these colors are used for everything.

Text: "bombastic and lively elegant man, wearing purple sweatpants, a green sweater, and black dockers, sits on the hood of a dark-blue BMW that is parked in front of the United States Capital building, and holds an enormous bug-net in the air and catches dollar-bills with it.", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

34.jpg

35.jpg

36.jpg

37.jpg

changed bug-net to net, seeing as how it draws insects. It clearly does not really Read the description like a sentence, exactly, yet it does seem to get some of it correct.

38.jpg

39.jpg

40.jpg

41.jpg

overhauled the sentence with brackets, to see if it aids the a.i.'s comprehension.
surprised that it still gets the color of the car wrong so often. Changed it to blue but that didn't do much.

Text: "man who is [ bombastic and lively elegant ] and [ wearing [ purple sweatpants] and [ green sweater ] and [ black dockers ] ], sits on [ [ the hood of ] a blue BMW parked [ [ in front of ] the [ United States Capital building ] and [ holds [ [in the air ] [ an enormous ] net ] and catches [ dollars [ that are flying around ] ].", perfect-face, photorealistic, highly detailed, sharp focus, perfect-body, masterpiece, critically-acclaimed, intricate, volumetric lighting, beautiful composition

42.jpg

43.jpg

44.jpg

45.jpg

46.jpg

47.jpg

48.jpg

49.jpg

Wow, I love your method!

Wow, I love your method!

Thank you :)

The AI has a very limited "memory". It can mash together many general and stylistic terms, but when it comes to simple facts about the things to be depicted, the more you add, the more confused the generated image becomes.

It can do "a cat", "a black cat", and "a black cat sitting in a box" consistently, as long as you're not too picky about anatomical accuracy. It can even do "a black cat sitting in a white box on a lawn" 1/4 of the time. "A black cat with green eyes sitting in a white box on a lawn" will work about as often. But the moment you add another detail, it takes a lot more tries. So "a black cat with green eyes wearing a collar sitting in a white box on a lawn" can still be done, but it's down to less than 7%. And as for "a black cat with green eyes wearing a red collar sitting in a white box on a lawn", it might be faster to get a cat, collar, and box, and photograph the thing itself. ๐Ÿ˜†

Looks like 8 or 9 precise concrete facts might be the limit for a reasonable amount of time, unless you just happen to get lucky. Or you've asked for something that appears in millions of images (e.g. "A wide angle close-up selfie taken from above of an adult female with pouty lips, wearing lingerie, in a cluttered bathroom with bad lighting," which I'm guessing you might get on the first try, even if you didn't want it. ๐Ÿ˜‰)

  1. Cat 2. Black cat 3. Black cat with green eyes 4. Black cat with green eyes sitting 5. Black cat with green eyes sitting in a box 6. Black cat with green eyes sitting in a white box 7. Black cat with green eyes sitting in a white box on a lawn 8. Black cat with green eyes wearing a collar sitting in a white box on a lawn

The AI has a very limited "memory". It can mash together many general and stylistic terms, but ...
... which I'm guessing you might get on the first try, even if you didn't want it. ๐Ÿ˜‰)

ok so just for fun:
so just for fun I tried this description you gave, Lol.
text: "A wide angle close-up selfie taken from above of an adult female with pouty lips, wearing lingerie, in a cluttered bathroom with bad lighting,"

4.jpg
5.jpg
6.jpg

then I modified it just to see if trying to find and eliminate unnecessary words helps. I also added "lively and elegant," idk, to make her look a bit more ... lively and .. elegant..

text: wide-angle, close-up selfie, from above, lively and elegant woman, pouty lips, wearing lingerie, in a cluttered bathroom, bad lighting
1.jpg
2.jpg
3.jpg

Sign up or log in to comment