Introduction:
Meet General Bagan, a cutting-edge text-to-image generator trained on a diverse dataset of over 200 images. With a keen understanding of textual inputs, it effortlessly translates words into visually stunning representations. From lifelike nature scenes to captivating abstract compositions.
Problem Statement:
When we prompted the stable diffusion model to generate an image of Bagan, it produced an image depicting a pagoda from Thailand. Hence, our decision was to fine-tune the current stable diffusion model using a multitude of Bagan photos in order to attain a clearer outcome.
How to create prompt:
When we create prompt for bagan, we have to consider 6 keywords. Those are Subject, Medium, Style, Art-sharing website, Resolution, and Additional details.
Subject -> What you want to see in the picture is the subject. Not writing enough about the subjects is a common error.
Medium -> The medium is the substance that artists work with. Illustration, oil painting, 3D rendering, and photography are a few examples. The impact of Medium is significant because a single keyword can significantly alter the style.
Style -> The image's artistic style is referred to as the style. Pop art, impressionist, and surrealist are a few examples.
Art-sharing website -> Specialty graphic websites like Deviant Art and Artstation compile a large number of images from various genres. One surefire way to direct the image toward these styles is to use them as a prompt.
Resolution -> Resolution represents how sharp and detailed the image is
Additional Details -> Sweeteners added to an image are additional details. To give the image a more dystopian and sci-fi feel, we will add those elements.
The example prompt for general bagan is: bagan, a creepy and eery Halloween setting, with Jack o lanterns on the street and shadow figures lurking about, dynamic lighting, photorealistic fantasy concept art, stunning visuals, creative, cinematic, ultra detailed, trending on art station, spooky vibe. That prompt gives you the Halloween theme.
Data:
We used stable diffusion v1.5 model to train with 223 bagan pictures.
Contributors:
Main Contributor: Ye Bhone Lin, Supervisor: Sa Phyo Thu Htet, Contributors: Thant Htoo San, Min Phone Thit
Limitation:
We can't generate a photo of a human.
Other Work:
In our exploration of image generation, we delve into the architectural marvels of Myanmar, featuring iconic landmarks such as Ananda, Shwezigon, Bupaya, Thatbyinnyu, and Mraukoo. Each structure stands as a testament to the rich cultural and historical tapestry of the region, captured through the lens of our innovative text-to-image generator, General Bagan.
References:
Wikipedia (2022). Stable Diffusion. Retrieved From: https://en.wikipedia.org/wiki/Stable_Diffusion
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Retrieved From: https://arxiv.org/abs/2112.10752
Naomi Brown (2022). What is Stable Diffusion and How to Use it. Retrieved From: https://www.fotor.com/blog/what-is-stable-diffusion
Mishra, O. (June, 9). Stable Diffusion Explained. Medium. https://medium.com/@onkarmishra/stable-diffusion-explained-1f101284484d
- Downloads last month
- 63
Model tree for YeBhoneLin10/bagan
Base model
runwayml/stable-diffusion-v1-5