Diffusers
Safetensors
English
File size: 4,474 Bytes
3b0f5f3
 
01bbfb2
 
 
 
 
 
3b0f5f3
 
01bbfb2
3b0f5f3
01bbfb2
 
3b0f5f3
eea2126
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
 
3b0f5f3
01bbfb2
 
3b0f5f3
01bbfb2
 
 
 
 
3b0f5f3
01bbfb2
 
 
 
 
 
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
 
 
3b0f5f3
 
01bbfb2
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
 
 
 
 
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
3b0f5f3
01bbfb2
 
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
eea2126
3b0f5f3
 
 
 
 
01bbfb2
3b0f5f3
01bbfb2
eea2126
3b0f5f3
01bbfb2
3b0f5f3
eea2126
3b0f5f3
 
 
01bbfb2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b0f5f3
 
 
01bbfb2
 
 
 
3b0f5f3
 
 
 
 
01bbfb2
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
01bbfb2
3b0f5f3
 
 
 
01bbfb2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
library_name: diffusers
license: apache-2.0
datasets:
- common-canvas/commoncatalog-cc-by
- alfredplpl/commoncatalog-cc-by-recap
language:
- en
---

# CommonArt-PoC

CommonArt is a text-to-image generation model with authorized images only.
The architecture is based on DiT that is using by Stable Diffusion 3 and Sora.

## How to Get Started with the Model

You can use this model by diffusers library.

```python
import torch
from diffusers import Transformer2DModel, PixArtSigmaPipeline

device = "cpu"
weight_dtype = torch.float32

transformer = Transformer2DModel.from_pretrained(
    "alfredplpl/CommonArt-PoC", 
    torch_dtype=weight_dtype,
    use_safetensors=True,
)

pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
    transformer=transformer,
    torch_dtype=weight_dtype,
    use_safetensors=True,
)

pipe.to(device)

prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty."
image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0]
image.save("beach.png")
```


## Model Details

### Model Description

- **Developed by:** alfredplpl
- **Funded by [optional]:** alfredplpl
- **Shared by [optional]:** alfredplpl
- **Model type:** Diffusion transformer
- **Language(s) (NLP):** English
- **License:** Apache-2.0

### Model Sources 

- **Repository:** [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma)
- **Paper:** [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692)

## Uses

- Any purpose

### Direct Use

- To develop commercial text-to-image generation.
- To research non-commercial text-to-image generation.

### Out-of-Scope Use

- To generate misinformation.

## Bias, Risks, and Limitations

- limited represantation

## Training Details

### Training Data

I used these dataset to train the transformer.

- CommonCatalog CC BY
- CommonCatalog CC BY Extention

### Training Procedure

TBA

#### Training Hyperparameters

- **Training regime:**
```bash
_base_ = ['../PixArt_xl2_internal.py']
data_root = "/mnt/my_raid/pixart"
image_list_json = ['data_info.json']

data = dict(
    type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train',
    load_vae_feat=False, load_t5_feat=False,
)
image_size = 256

# model setting
model = 'PixArt_XL_2'
mixed_precision = 'fp16'  # ['fp16', 'fp32', 'bf16']
fp32_attention = True
#load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth"  # https://huggingface.co/PixArt-alpha/PixArt-Sigma
#resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae"  # sdxl vae
multi_scale = False  # if use multiscale dataset model training
pe_interpolation = 0.5

# training setting
num_workers = 10
train_batch_size = 64  # 64 as default
num_epochs = 200  # 3
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.2
optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
lr_schedule_args = dict(num_warmup_steps=1000)

#visualize=True
#train_sampling_steps = 3
#eval_sampling_steps = 3
log_interval = 20
save_model_epochs = 1
#save_model_steps = 2500
work_dir = 'output/debug'

# pixart-sigma
scale_factor = 0.13025
real_prompt_ratio = 0.5
model_max_length = 512
class_dropout_prob = 0.1

```

## Environmental Impact

- **Hardware Type:** A6000x2
- **Hours used:** 1000
- **Compute Region:** Japan
- **Carbon Emitted:** Not so much

## Technical Specifications [optional]

### Model Architecture and Objective

Diffusion Transformer

### Compute Infrastructure

Desktop PC

#### Hardware

A6000x2

#### Software

[Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma)


## Model Card Contact

alfredplpl