--- library_name: diffusers license: apache-2.0 datasets: - common-canvas/commoncatalog-cc-by - alfredplpl/commoncatalog-cc-by-recap language: - en --- # CommonArt-PoC ![tokyo](tokyo.png) CommonArt is a text-to-image generation model with authorized images only. The architecture is based on DiT that is used by Stable Diffusion 3 and Sora. ## How to Get Started with the Model You can use this model by diffusers library. ```python import torch from diffusers import Transformer2DModel, PixArtSigmaPipeline device = "cpu" weight_dtype = torch.float32 transformer = Transformer2DModel.from_pretrained( "alfredplpl/CommonArt-PoC", torch_dtype=weight_dtype, use_safetensors=True, ) pipe = PixArtSigmaPipeline.from_pretrained( "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers", transformer=transformer, torch_dtype=weight_dtype, use_safetensors=True, ) pipe.to(device) prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty." image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0] image.save("beach.png") ``` ## Model Details ### Model Description - **Developed by:** alfredplpl - **Funded by :** alfredplpl - **Shared by :** alfredplpl - **Model type:** Diffusion transformer - **Language(s) (NLP):** English - **License:** Apache-2.0 ### Model Sources - **Repository:** [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma) - **Paper:** [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692) ## Uses - Any purpose ### Direct Use - To develop commercial text-to-image generation. - To research non-commercial text-to-image generation. ### Out-of-Scope Use - To generate misinformation. ## Bias, Risks, and Limitations - limited represantation ## Training Details ### Training Data I used these dataset to train the transformer. - CommonCatalog CC BY - CommonCatalog CC BY Extention #### Training Hyperparameters - **Training regime:** ```bash _base_ = ['../PixArt_xl2_internal.py'] data_root = "/mnt/my_raid/pixart" image_list_json = ['data_info.json'] data = dict( type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train', load_vae_feat=False, load_t5_feat=False, ) image_size = 256 # model setting model = 'PixArt_XL_2' mixed_precision = 'fp16' # ['fp16', 'fp32', 'bf16'] fp32_attention = True #load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth" # https://huggingface.co/PixArt-alpha/PixArt-Sigma #resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True) vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae" # sdxl vae multi_scale = False # if use multiscale dataset model training pe_interpolation = 0.5 # training setting num_workers = 10 train_batch_size = 64 # 64 as default num_epochs = 200 # 3 gradient_accumulation_steps = 1 grad_checkpointing = True gradient_clip = 0.2 optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16)) lr_schedule_args = dict(num_warmup_steps=1000) #visualize=True #train_sampling_steps = 3 #eval_sampling_steps = 3 log_interval = 20 save_model_epochs = 1 #save_model_steps = 2500 work_dir = 'output/debug' # pixart-sigma scale_factor = 0.13025 real_prompt_ratio = 0.5 model_max_length = 512 class_dropout_prob = 0.1 ``` ## How to resume training 1. Download the [model](checkpoint/epoch_50_step_116738.pth). 1. Set the model as "resume_from" model. ## Environmental Impact - **Hardware Type:** A6000x2 - **Hours used:** 700 - **Compute Region:** Japan - **Carbon Emitted:** Not so much ## Technical Specifications ### Model Architecture and Objective Diffusion Transformer ### Compute Infrastructure Desktop PC #### Hardware A6000x2 #### Software [Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma) ## Model Card Contact alfredplpl