Diffusers
Safetensors
English
alfredplpl commited on
Commit
01bbfb2
1 Parent(s): 27d422f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -121
README.md CHANGED
@@ -1,62 +1,83 @@
1
  ---
2
  library_name: diffusers
 
 
 
 
 
 
3
  ---
4
 
5
- # Model Card for Model ID
6
 
7
- <!-- Provide a quick summary of what the model is/does. -->
 
8
 
 
9
 
 
10
 
11
- ## Model Details
 
 
12
 
13
- ### Model Description
 
14
 
15
- <!-- Provide a longer summary of what this model is. -->
 
 
 
 
16
 
17
- This is the model card of a 🧨 diffusers model that has been pushed on the Hub. This model card has been automatically generated.
 
 
 
 
 
18
 
19
- - **Developed by:** [More Information Needed]
20
- - **Funded by [optional]:** [More Information Needed]
21
- - **Shared by [optional]:** [More Information Needed]
22
- - **Model type:** [More Information Needed]
23
- - **Language(s) (NLP):** [More Information Needed]
24
- - **License:** [More Information Needed]
25
- - **Finetuned from model [optional]:** [More Information Needed]
26
 
27
- ### Model Sources [optional]
 
 
 
28
 
29
- <!-- Provide the basic links for the model. -->
30
 
31
- - **Repository:** [More Information Needed]
32
- - **Paper [optional]:** [More Information Needed]
33
- - **Demo [optional]:** [More Information Needed]
34
 
35
- ## Uses
36
 
37
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
 
 
38
 
39
- ### Direct Use
40
 
41
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
42
 
43
- [More Information Needed]
44
 
45
- ### Downstream Use [optional]
46
 
47
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
 
49
- [More Information Needed]
 
50
 
51
  ### Out-of-Scope Use
52
 
53
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
 
55
  [More Information Needed]
56
 
57
  ## Bias, Risks, and Limitations
58
 
59
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
 
61
  [More Information Needed]
62
 
@@ -68,131 +89,99 @@ Users (both direct and downstream) should be made aware of the risks, biases and
68
 
69
  ## How to Get Started with the Model
70
 
71
- Use the code below to get started with the model.
72
-
73
- [More Information Needed]
74
 
75
  ## Training Details
76
 
77
  ### Training Data
78
 
79
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
 
81
- [More Information Needed]
 
82
 
83
- ### Training Procedure
84
 
85
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
 
87
- #### Preprocessing [optional]
88
 
89
- [More Information Needed]
90
 
91
 
92
  #### Training Hyperparameters
93
 
94
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
-
96
- #### Speeds, Sizes, Times [optional]
97
-
98
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
-
100
- [More Information Needed]
101
-
102
- ## Evaluation
103
-
104
- <!-- This section describes the evaluation protocols and provides the results. -->
105
-
106
- ### Testing Data, Factors & Metrics
107
-
108
- #### Testing Data
109
-
110
- <!-- This should link to a Dataset Card if possible. -->
111
-
112
- [More Information Needed]
113
-
114
- #### Factors
115
-
116
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
-
118
- [More Information Needed]
119
-
120
- #### Metrics
121
-
122
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
-
124
- [More Information Needed]
125
-
126
- ### Results
127
-
128
- [More Information Needed]
129
-
130
- #### Summary
131
-
132
-
133
-
134
- ## Model Examination [optional]
135
-
136
- <!-- Relevant interpretability work for the model goes here -->
137
-
138
- [More Information Needed]
 
 
139
 
140
  ## Environmental Impact
141
 
142
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
-
144
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
-
146
- - **Hardware Type:** [More Information Needed]
147
- - **Hours used:** [More Information Needed]
148
- - **Cloud Provider:** [More Information Needed]
149
- - **Compute Region:** [More Information Needed]
150
- - **Carbon Emitted:** [More Information Needed]
151
 
152
  ## Technical Specifications [optional]
153
 
154
  ### Model Architecture and Objective
155
 
156
- [More Information Needed]
157
 
158
  ### Compute Infrastructure
159
 
160
- [More Information Needed]
161
 
162
  #### Hardware
163
 
164
- [More Information Needed]
165
 
166
  #### Software
167
 
168
- [More Information Needed]
169
-
170
- ## Citation [optional]
171
-
172
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
-
174
- **BibTeX:**
175
-
176
- [More Information Needed]
177
-
178
- **APA:**
179
-
180
- [More Information Needed]
181
-
182
- ## Glossary [optional]
183
 
184
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
-
186
- [More Information Needed]
187
-
188
- ## More Information [optional]
189
-
190
- [More Information Needed]
191
-
192
- ## Model Card Authors [optional]
193
-
194
- [More Information Needed]
195
 
196
  ## Model Card Contact
197
 
198
- [More Information Needed]
 
1
  ---
2
  library_name: diffusers
3
+ license: apache-2.0
4
+ datasets:
5
+ - common-canvas/commoncatalog-cc-by
6
+ - alfredplpl/commoncatalog-cc-by-recap
7
+ language:
8
+ - en
9
  ---
10
 
11
+ # CommonArt-PoC
12
 
13
+ CommonArt is a text-to-image generation model with authorized images only.
14
+ The architecture is based on DiT that is using by Stable Diffusion 3 and Sora.
15
 
16
+ # Usage
17
 
18
+ You can use this model by diffusers library.
19
 
20
+ ```python
21
+ import torch
22
+ from diffusers import Transformer2DModel, PixArtSigmaPipeline
23
 
24
+ device = "cpu"
25
+ weight_dtype = torch.float32
26
 
27
+ transformer = Transformer2DModel.from_pretrained(
28
+ "alfredplpl/CommonArt-PoC",
29
+ torch_dtype=weight_dtype,
30
+ use_safetensors=True,
31
+ )
32
 
33
+ pipe = PixArtSigmaPipeline.from_pretrained(
34
+ "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers",
35
+ transformer=transformer,
36
+ torch_dtype=weight_dtype,
37
+ use_safetensors=True,
38
+ )
39
 
40
+ pipe.to(device)
 
 
 
 
 
 
41
 
42
+ prompt = " A picturesque photograph of a serene coastline, capturing the tranquility of a sunrise over the ocean. The image shows a wide expanse of gently rolling sandy beach, with clear, turquoise water stretching into the horizon. Seashells and pebbles are scattered along the shore, and the sun's rays create a golden hue on the water's surface. The distant outline of a lighthouse can be seen, adding to the quaint charm of the scene. The sky is painted with soft pastel colors of dawn, gradually transitioning from pink to blue, creating a sense of peacefulness and beauty."
43
+ image = pipe(prompt,guidance_scale=4.5,max_squence_length=512).images[0]
44
+ image.save("beach.png")
45
+ ```
46
 
 
47
 
48
+ ## Model Details
 
 
49
 
50
+ ### Model Description
51
 
52
+ - **Developed by:** alfredplpl
53
+ - **Funded by [optional]:** alfredplpl
54
+ - **Shared by [optional]:** alfredplpl
55
+ - **Model type:** Diffusion transformer
56
+ - **Language(s) (NLP):** English
57
+ - **License:** Apache-2.0
58
 
59
+ ### Model Sources
60
 
61
+ - **Repository:** [Pixart-Sigma](https://github.com/PixArt-alpha/PixArt-sigma)
62
+ - **Paper:** [PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation](https://arxiv.org/abs/2403.04692)
63
 
64
+ ## Uses
65
 
66
+ - Any purpose
67
 
68
+ ### Direct Use
69
 
70
+ - To develop commercial text-to-image generation.
71
+ - To research non-commercial text-to-image generation.
72
 
73
  ### Out-of-Scope Use
74
 
75
+ - To generate misinformation.
76
 
77
  [More Information Needed]
78
 
79
  ## Bias, Risks, and Limitations
80
 
 
81
 
82
  [More Information Needed]
83
 
 
89
 
90
  ## How to Get Started with the Model
91
 
92
+ You use
 
 
93
 
94
  ## Training Details
95
 
96
  ### Training Data
97
 
98
+ I used these dataset to train the transformer.
99
 
100
+ - CommonCatalog CC BY
101
+ - CommonCatalog CC BY
102
 
 
103
 
104
+ ### Training Procedure
105
 
 
106
 
 
107
 
108
 
109
  #### Training Hyperparameters
110
 
111
+ - **Training regime:**
112
+ ```bash
113
+ _base_ = ['../PixArt_xl2_internal.py']
114
+ data_root = "/mnt/my_raid/pixart"
115
+ image_list_json = ['data_info.json']
116
+
117
+ data = dict(
118
+ type='InternalDataSigma', root='InternData', image_list_json=image_list_json, transform='default_train',
119
+ load_vae_feat=False, load_t5_feat=False,
120
+ )
121
+ image_size = 256
122
+
123
+ # model setting
124
+ model = 'PixArt_XL_2'
125
+ mixed_precision = 'fp16' # ['fp16', 'fp32', 'bf16']
126
+ fp32_attention = True
127
+ #load_from = "/mnt/my_raid/pixart/working/checkpoints/epoch_1_step_17500.pth" # https://huggingface.co/PixArt-alpha/PixArt-Sigma
128
+ #resume_from = dict(checkpoint="/mnt/my_raid/pixart/working/checkpoints/epoch_37_step_62039.pth", load_ema=False, resume_optimizer=True, resume_lr_scheduler=True)
129
+ vae_pretrained = "output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae" # sdxl vae
130
+ multi_scale = False # if use multiscale dataset model training
131
+ pe_interpolation = 0.5
132
+
133
+ # training setting
134
+ num_workers = 10
135
+ train_batch_size = 64 # 64 as default
136
+ num_epochs = 200 # 3
137
+ gradient_accumulation_steps = 1
138
+ grad_checkpointing = True
139
+ gradient_clip = 0.2
140
+ optimizer = dict(type='CAMEWrapper', lr=2e-5, weight_decay=0.0, betas=(0.9, 0.999, 0.9999), eps=(1e-30, 1e-16))
141
+ lr_schedule_args = dict(num_warmup_steps=1000)
142
+
143
+ #visualize=True
144
+ #train_sampling_steps = 3
145
+ #eval_sampling_steps = 3
146
+ log_interval = 20
147
+ save_model_epochs = 1
148
+ #save_model_steps = 2500
149
+ work_dir = 'output/debug'
150
+
151
+ # pixart-sigma
152
+ scale_factor = 0.13025
153
+ real_prompt_ratio = 0.5
154
+ model_max_length = 512
155
+ class_dropout_prob = 0.1
156
+
157
+ ```
158
 
159
  ## Environmental Impact
160
 
161
+ - **Hardware Type:** A6000x2
162
+ - **Hours used:** 1000
163
+ - **Compute Region:** Japan
164
+ - **Carbon Emitted:** Not so much
 
 
 
 
 
165
 
166
  ## Technical Specifications [optional]
167
 
168
  ### Model Architecture and Objective
169
 
170
+ Diffusion Transformer
171
 
172
  ### Compute Infrastructure
173
 
174
+ Desktop PC
175
 
176
  #### Hardware
177
 
178
+ A6000x2
179
 
180
  #### Software
181
 
182
+ [Pixart-Sigma repository](https://github.com/PixArt-alpha/PixArt-sigma)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183
 
 
 
 
 
 
 
 
 
 
 
 
184
 
185
  ## Model Card Contact
186
 
187
+ alfredplpl