Diffusers
Safetensors
English
CommonArt-PoC / README.md
alfredplpl's picture
Update README.md
eea2126 verified
|
raw
history blame
No virus
4.47 kB
---
library_name: diffusers
license: apache-2.0
datasets:
- common-canvas/commoncatalog-cc-by
- alfredplpl/commoncatalog-cc-by-recap
language:
- en
---
# CommonArt-PoC
CommonArt is a text-to-image generation model with authorized images only.
The architecture is based on DiT that is using by Stable Diffusion 3 and Sora.
## How to Get Started with the Model
You can use this model by diffusers library.
```python
import torch
from diffusers import Transformer2DModel, PixArtSigmaPipeline
device = "cpu"
weight_dtype = torch.float32
transformer = Transformer2DModel.from_pretrained(
"alfredplpl/CommonArt-PoC",
torch_dtype=weight_dtype,
use_safetensors=True,
)
pipe = PixArtSigmaPipeline.from_pretrained(
"PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
transformer=transformer,
torch_dtype=weight_dtype,
use_safetensors=True,
)
pipe.to(device)
prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty."
image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0]
image.save("beach.png")
```
## Model Details
### Model Description
- **Developed by:** alfredplpl
- **Funded by [optional]:** alfredplpl
- **Shared by [optional]:** alfredplpl
- **Model type:** Diffusion transformer
- **Language(s) (NLP):** English
- **License:** Apache-2.0
### Model Sources
- **Repository:** [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma)
- **Paper:** [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692)
## Uses
- Any purpose
### Direct Use
- To develop commercial text-to-image generation.
- To research non-commercial text-to-image generation.
### Out-of-Scope Use
- To generate misinformation.
## Bias, Risks, and Limitations
- limited represantation
## Training Details
### Training Data
I used these dataset to train the transformer.
- CommonCatalog CC BY
- CommonCatalog CC BY Extention
### Training Procedure
TBA
#### Training Hyperparameters
- **Training regime:**
```bash
_base_ = ['../PixArt_xl2_internal.py']
data_root = "/mnt/my_raid/pixart"
image_list_json = ['data_info.json']
data = dict(
type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train',
load_vae_feat=False, load_t5_feat=False,
)
image_size = 256
# model setting
model = 'PixArt_XL_2'
mixed_precision = 'fp16' # ['fp16', 'fp32', 'bf16']
fp32_attention = True
#load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth" # https://huggingface.co/PixArt-alpha/PixArt-Sigma
#resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae" # sdxl vae
multi_scale = False # if use multiscale dataset model training
pe_interpolation = 0.5
# training setting
num_workers = 10
train_batch_size = 64 # 64 as default
num_epochs = 200 # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.2
optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
lr_schedule_args = dict(num_warmup_steps=1000)
#visualize=True
#train_sampling_steps = 3
#eval_sampling_steps = 3
log_interval = 20
save_model_epochs = 1
#save_model_steps = 2500
work_dir = 'output/debug'
# pixart-sigma
scale_factor = 0.13025
real_prompt_ratio = 0.5
model_max_length = 512
class_dropout_prob = 0.1
```
## Environmental Impact
- **Hardware Type:** A6000x2
- **Hours used:** 1000
- **Compute Region:** Japan
- **Carbon Emitted:** Not so much
## Technical Specifications [optional]
### Model Architecture and Objective
Diffusion Transformer
### Compute Infrastructure
Desktop PC
#### Hardware
A6000x2
#### Software
[Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma)
## Model Card Contact
alfredplpl