Spaces:

bilegentile
/

test

Runtime error

App Files Files Community

test / wiki /T2I-Compare-Frameworks.md

bilegentile's picture

Upload folder using huggingface_hub

c19ca42 verified 11 months ago

|

history blame contribute delete

3.75 kB

A newer version of the Gradio SDK is available: 5.25.0

Upgrade

Generative Art Framework Comparison

Most Popular

OpenAI Dall-E

URL: https://openai.com/dall-e-2/
Usage: SaaS (via native API and libraries), small base credits to start with, pay-to-play afterwards
Training Size: 12B/6.5B/3.5B params
Notes:

Commonly used is v2 which is better and smaller than v1
and its getting smaller and faster in each iteration
Best of available ones for human images
Style transfers or model modifiers are charged extra
Dall-E is also licensed to 3rd parties as embedded engine:
Microsoft Designer, etc.
Craiyon as free smaller version (was "Dall-E Mini", but renamed due to copyright)
as original architects did not like commercial direction: https://www.craiyon.com/
OpenAI Glide is also from OpenAI, frequently ignored in favor of Dall-E, but not far result-wise

MidJourney

URL: https://midjourney.gitbook.io/docs/
Usage: SaaS (discord bot or web app) only, free to play with, pay-to-play for commercial usage
Lead: David Holz Notes:

Developed by research lab after lead sold his previous startup
Quickest decent looking results, but little tuning available
Results are often painting-like regardless of desired style
Often better 3D-effect than others

CompVis/Stability.AI/RunwayML Stable Diffusion

URL: https://stability.ai/
Training size: 1.4B params
Usage: SaaS of offline usage, only fully open-source (Creative ML OpenRAIL-M license) to self-run
Notes:

Originally research project by CompVis, continuing under Stability.AI entity but still open source
Training in partnership with RunwayML
Weights distributed via HuggingFace (only model with weights available)
Can be fiddly due to large number of modifiers and tunables, not great for faces out-of-the-box
Best results when using inpainting and adding of negative prompts
Version v2 removes styles from plenty authors and reduces tunables
Better photo-realistic results, but prompts require far more complexity to guide it
Official commercial product via Stability.AI DreamStudio https://beta.dreamstudio.ai/dream

Promising but not Available

nVidia eDiff-I

URL: https://deepimagination.cc/eDiff-I/
Usage: Not (yet) publicly available
Training size: 9.1B params
Note:

Looks very promising, especially with built-in style transfers
Somewhat different internal architecture with single-pass multi-encoders

Meta Make-a-Scene

URL: https://ai.facebook.com/blog/greater-creative-control-for-ai-image-generation/
Training size: 4B params
Usage: Not publicly available
Notes:

Future is likely meta internal tool until it becomes a filter for IG/FG or something
Can also generate videos: Make-a-Video

Google Imagen

URL: https://imagen.research.google/
Usage: Not publicly available
Training size: 7.9B params
Notes:

High-end research from Google Brain, not a commercial product
This is commonly used as a benchmark and reference point to see how good any other product is
Can also generate videos: https://imagen.research.google/video/
Google DreamBooth looks to separate algorithm to allow to
apply Imagen textual inversion techniques to other trained models: https://dreambooth.github.io/

Google Parti

URL: https://parti.research.google/
Usage: Not publicly available
Training size: 20B params
Notes:

Different architecture as it does not use diffusion at all
True SOTA, but massively large (10x), better than anything

Microsoft NUWA Infinity

URL: https://nuwa-infinity.microsoft.com/#/
Notes:

Looks impressive, but no idea where its heading