Diffusers
Safetensors
English

CommonArt-PoC

tokyo

CommonArt is a text-to-image generation model with authorized images only. The architecture is based on DiT that is used by Stable Diffusion 3 and Sora.

How to Get Started with the Model

You can use this model by diffusers library.

import torch
from diffusers import Transformer2DModel, PixArtSigmaPipeline

device = "cpu"
weight_dtype = torch.float32

transformer = Transformer2DModel.from_pretrained(
    "alfredplpl/CommonArt-PoC", 
    torch_dtype=weight_dtype,
    use_safetensors=True,
)

pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
    transformer=transformer,
    torch_dtype=weight_dtype,
    use_safetensors=True,
)

pipe.to(device)

prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty."
image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0]
image.save("beach.png")

Model Details

Model Description

  • Developed by: alfredplpl
  • Funded by : alfredplpl
  • Shared by : alfredplpl
  • Model type: Diffusion transformer
  • Language(s) (NLP): English
  • License: Apache-2.0

Model Sources

Uses

  • Any purpose

Direct Use

  • To develop commercial text-to-image generation.
  • To research non-commercial text-to-image generation.

Out-of-Scope Use

  • To generate misinformation.

Bias, Risks, and Limitations

  • limited represantation

Training Details

Training Data

I used these dataset to train the transformer.

  • CommonCatalog CC BY
  • CommonCatalog CC BY Extention

Training Hyperparameters

  • Training regime:
_base_ = ['../PixArt_xl2_internal.py']
data_root = "/mnt/my_raid/pixart"
image_list_json = ['data_info.json']

data = dict(
    type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train',
    load_vae_feat=False, load_t5_feat=False,
)
image_size = 256

# model setting
model = 'PixArt_XL_2'
mixed_precision = 'fp16'  # ['fp16', 'fp32', 'bf16']
fp32_attention = True
#load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth"  # https://huggingface.co/PixArt-alpha/PixArt-Sigma
#resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae"  # sdxl vae
multi_scale = False  # if use multiscale dataset model training
pe_interpolation = 0.5

# training setting
num_workers = 10
train_batch_size = 64  # 64 as default
num_epochs = 200  # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.2
optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
lr_schedule_args = dict(num_warmup_steps=1000)

#visualize=True
#train_sampling_steps = 3
#eval_sampling_steps = 3
log_interval = 20
save_model_epochs = 1
#save_model_steps = 2500
work_dir = 'output/debug'

# pixart-sigma
scale_factor = 0.13025
real_prompt_ratio = 0.5
model_max_length = 512
class_dropout_prob = 0.1

How to resume training

  1. Download the model.
  2. Set the model as "resume_from" model.

Environmental Impact

  • Hardware Type: A6000x2
  • Hours used: 700
  • Compute Region: Japan
  • Carbon Emitted: Not so much

Technical Specifications

Model Architecture and Objective

Diffusion Transformer

Compute Infrastructure

Desktop PC

Hardware

A6000x2

Software

Pixart-Sigma repository

Model Card Contact

alfredplpl

Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Datasets used to train alfredplpl/CommonArt-PoC