Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.25.0
Generative Art Framework Comparison
Most Popular
OpenAI Dall-E
URL: https://openai.com/dall-e-2/
Usage: SaaS (via native API and libraries), small base credits to start with, pay-to-play afterwards
Training Size: 12B/6.5B/3.5B params
Notes:
- Commonly used is v2 which is better and smaller than v1
and its getting smaller and faster in each iteration - Best of available ones for human images
- Style transfers or model modifiers are charged extra
- Dall-E is also licensed to 3rd parties as embedded engine:
Microsoft Designer, etc. - Craiyon as free smaller version (was "Dall-E Mini", but renamed due to copyright)
as original architects did not like commercial direction: https://www.craiyon.com/ - OpenAI Glide is also from OpenAI, frequently ignored in favor of Dall-E, but not far result-wise
MidJourney
URL: https://midjourney.gitbook.io/docs/
Usage: SaaS (discord bot or web app) only, free to play with, pay-to-play for commercial usage
Lead: David Holz
Notes:
- Developed by research lab after lead sold his previous startup
- Quickest decent looking results, but little tuning available
- Results are often painting-like regardless of desired style
- Often better 3D-effect than others
CompVis/Stability.AI/RunwayML Stable Diffusion
URL: https://stability.ai/
Training size: 1.4B params
Usage: SaaS of offline usage, only fully open-source (Creative ML OpenRAIL-M license) to self-run
Notes:
- Originally research project by CompVis, continuing under Stability.AI entity but still open source
- Training in partnership with RunwayML
- Weights distributed via HuggingFace (only model with weights available)
- Can be fiddly due to large number of modifiers and tunables, not great for faces out-of-the-box
- Best results when using inpainting and adding of negative prompts
- Version v2 removes styles from plenty authors and reduces tunables
Better photo-realistic results, but prompts require far more complexity to guide it - Official commercial product via Stability.AI DreamStudio https://beta.dreamstudio.ai/dream
Promising but not Available
nVidia eDiff-I
URL: https://deepimagination.cc/eDiff-I/
Usage: Not (yet) publicly available
Training size: 9.1B params
Note:
- Looks very promising, especially with built-in style transfers
- Somewhat different internal architecture with single-pass multi-encoders
Meta Make-a-Scene
URL: https://ai.facebook.com/blog/greater-creative-control-for-ai-image-generation/
Training size: 4B params
Usage: Not publicly available
Notes:
- Future is likely meta internal tool until it becomes a filter for IG/FG or something
- Can also generate videos: Make-a-Video
Google Imagen
URL: https://imagen.research.google/
Usage: Not publicly available
Training size: 7.9B params
Notes:
- High-end research from Google Brain, not a commercial product
- This is commonly used as a benchmark and reference point to see how good any other product is
- Can also generate videos: https://imagen.research.google/video/
- Google DreamBooth looks to separate algorithm to allow to
apply Imagen textual inversion techniques to other trained models: https://dreambooth.github.io/
Google Parti
URL: https://parti.research.google/
Usage: Not publicly available
Training size: 20B params
Notes:
- Different architecture as it does not use diffusion at all
- True SOTA, but massively large (10x), better than anything
Microsoft NUWA Infinity
URL: https://nuwa-infinity.microsoft.com/#/
Notes:
- Looks impressive, but no idea where its heading