license: apache-2.0
tags:
- text-to-image
- safetensors
- diffusers
datasets:
- JourneyDB/JourneyDB
library_name: diffusers
pipeline_tag: text-to-image
Lumina-Next-SFT
The Lumina-Next-SFT
is a Next-DiT model containing 2B parameters and utilizes Gemma-2B as the text encoder, enhanced through high-quality supervised fine-tuning (SFT).
Our generative model has Next-DiT
as the backbone, the text encoder is the Gemma
2B model, and the VAE uses a version of sdxl
fine-tuned by stabilityai.
- Generation Model: Next-DiT
- Text Encoder: Gemma-2B
- VAE: stabilityai/sdxl-vae
๐ฐ News
[2024-07-08] ๐๐๐ Lumina-Next is now supported in the diffusers! Thanks to @yiyixuxu and @sayakpaul!
[2024-06-08] ๐๐๐ We have released the
Lumina-Next-SFT
model.[2024-05-28] We updated the
Lumina-Next-T2I
model to support 2K Resolution image generation.[2024-05-16] We have converted the
.pth
weights to.safetensors
weights. Please pull the latest code to usedemo.py
for inference.[2024-05-12] We release the next version of
Lumina-T2I
, calledLumina-Next-T2I
for faster and lower memory usage image generation model.
๐ฎ Model Zoo
More checkpoints of our model will be released soon~
Resolution | Next-DiT Parameter | Text Encoder | Prediction | Download URL |
---|---|---|---|---|
1024 | 2B | Gemma-2B | Rectified Flow | hugging face |
Installation
1. Create a conda environment and install PyTorch
Note: You may want to adjust the CUDA version according to your driver version.
conda create -n Lumina_T2X -y
conda activate Lumina_T2X
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
2. Install dependencies
pip install diffusers huggingface_hub
3. Install flash-attn
pip install flash-attn --no-build-isolation
Inference
- Prepare the pre-trained model
โญโญ (Recommended) you can use huggingface_cli to download our model:
huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-SFT-diffusers --local-dir /path/to/ckpt
- Run with demo code:
from diffusers import LuminaText2ImgPipeline
import torch
pipeline = LuminaText2ImgPipeline.from_pretrained("/path/to/ckpt/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")
# or you can download the model using code directly
# pipeline = LuminaText2ImgPipeline.from_pretrained("Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")
image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. "
"Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]