potamides commited on
Commit
6d7e7ce
·
1 Parent(s): 136e6a2

Upload StableDiffusion3InstructPix2PixPipeline

Browse files
README.md ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: diffusers
3
+ ---
4
+
5
+ # Model Card for Model ID
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+
9
+
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+
15
+ <!-- Provide a longer summary of what this model is. -->
16
+
17
+ This is the model card of a 🧨 diffusers model that has been pushed on the Hub. This model card has been automatically generated.
18
+
19
+ - **Developed by:** [More Information Needed]
20
+ - **Funded by [optional]:** [More Information Needed]
21
+ - **Shared by [optional]:** [More Information Needed]
22
+ - **Model type:** [More Information Needed]
23
+ - **Language(s) (NLP):** [More Information Needed]
24
+ - **License:** [More Information Needed]
25
+ - **Finetuned from model [optional]:** [More Information Needed]
26
+
27
+ ### Model Sources [optional]
28
+
29
+ <!-- Provide the basic links for the model. -->
30
+
31
+ - **Repository:** [More Information Needed]
32
+ - **Paper [optional]:** [More Information Needed]
33
+ - **Demo [optional]:** [More Information Needed]
34
+
35
+ ## Uses
36
+
37
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
38
+
39
+ ### Direct Use
40
+
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
42
+
43
+ [More Information Needed]
44
+
45
+ ### Downstream Use [optional]
46
+
47
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Out-of-Scope Use
52
+
53
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
54
+
55
+ [More Information Needed]
56
+
57
+ ## Bias, Risks, and Limitations
58
+
59
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ [More Information Needed]
74
+
75
+ ## Training Details
76
+
77
+ ### Training Data
78
+
79
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
80
+
81
+ [More Information Needed]
82
+
83
+ ### Training Procedure
84
+
85
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
86
+
87
+ #### Preprocessing [optional]
88
+
89
+ [More Information Needed]
90
+
91
+
92
+ #### Training Hyperparameters
93
+
94
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
95
+
96
+ #### Speeds, Sizes, Times [optional]
97
+
98
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
99
+
100
+ [More Information Needed]
101
+
102
+ ## Evaluation
103
+
104
+ <!-- This section describes the evaluation protocols and provides the results. -->
105
+
106
+ ### Testing Data, Factors & Metrics
107
+
108
+ #### Testing Data
109
+
110
+ <!-- This should link to a Dataset Card if possible. -->
111
+
112
+ [More Information Needed]
113
+
114
+ #### Factors
115
+
116
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Metrics
121
+
122
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
123
+
124
+ [More Information Needed]
125
+
126
+ ### Results
127
+
128
+ [More Information Needed]
129
+
130
+ #### Summary
131
+
132
+
133
+
134
+ ## Model Examination [optional]
135
+
136
+ <!-- Relevant interpretability work for the model goes here -->
137
+
138
+ [More Information Needed]
139
+
140
+ ## Environmental Impact
141
+
142
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
143
+
144
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
145
+
146
+ - **Hardware Type:** [More Information Needed]
147
+ - **Hours used:** [More Information Needed]
148
+ - **Cloud Provider:** [More Information Needed]
149
+ - **Compute Region:** [More Information Needed]
150
+ - **Carbon Emitted:** [More Information Needed]
151
+
152
+ ## Technical Specifications [optional]
153
+
154
+ ### Model Architecture and Objective
155
+
156
+ [More Information Needed]
157
+
158
+ ### Compute Infrastructure
159
+
160
+ [More Information Needed]
161
+
162
+ #### Hardware
163
+
164
+ [More Information Needed]
165
+
166
+ #### Software
167
+
168
+ [More Information Needed]
169
+
170
+ ## Citation [optional]
171
+
172
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
173
+
174
+ **BibTeX:**
175
+
176
+ [More Information Needed]
177
+
178
+ **APA:**
179
+
180
+ [More Information Needed]
181
+
182
+ ## Glossary [optional]
183
+
184
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
185
+
186
+ [More Information Needed]
187
+
188
+ ## More Information [optional]
189
+
190
+ [More Information Needed]
191
+
192
+ ## Model Card Authors [optional]
193
+
194
+ [More Information Needed]
195
+
196
+ ## Model Card Contact
197
+
198
+ [More Information Needed]
model_index.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusion3InstructPix2PixPipeline",
3
+ "_diffusers_version": "0.30.1",
4
+ "_name_or_path": "nllg/ultraedit",
5
+ "scheduler": [
6
+ "scheduling_flow_match_euler_discrete",
7
+ "FlowMatchEulerDiscreteScheduler"
8
+ ],
9
+ "text_encoder": [
10
+ "transformers",
11
+ "CLIPTextModelWithProjection"
12
+ ],
13
+ "text_encoder_2": [
14
+ "transformers",
15
+ "CLIPTextModelWithProjection"
16
+ ],
17
+ "text_encoder_3": [
18
+ "transformers",
19
+ "T5EncoderModel"
20
+ ],
21
+ "tokenizer": [
22
+ "transformers",
23
+ "CLIPTokenizer"
24
+ ],
25
+ "tokenizer_2": [
26
+ "transformers",
27
+ "CLIPTokenizer"
28
+ ],
29
+ "tokenizer_3": [
30
+ "transformers",
31
+ "T5TokenizerFast"
32
+ ],
33
+ "transformer": [
34
+ "diffusers",
35
+ "SD3Transformer2DModel"
36
+ ],
37
+ "vae": [
38
+ "diffusers",
39
+ "AutoencoderKL"
40
+ ]
41
+ }
pipeline.py ADDED
@@ -0,0 +1,983 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Stability AI and The HuggingFace Team. All rights reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ from typing import Any, Callable, Dict, List, Optional, Union
16
+
17
+ import PIL.Image
18
+ from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
19
+ from diffusers.loaders import FromSingleFileMixin, SD3LoraLoaderMixin
20
+ from diffusers.models.autoencoders import AutoencoderKL
21
+ from diffusers.models.transformers import SD3Transformer2DModel
22
+ from diffusers.pipelines.pipeline_utils import DiffusionPipeline
23
+ from diffusers.pipelines.stable_diffusion_3.pipeline_output import (
24
+ StableDiffusion3PipelineOutput,
25
+ )
26
+ from diffusers.pipelines.stable_diffusion_3.pipeline_stable_diffusion_3_img2img import (
27
+ retrieve_latents,
28
+ retrieve_timesteps,
29
+ )
30
+ from diffusers.schedulers import FlowMatchEulerDiscreteScheduler
31
+ from diffusers.utils import is_torch_xla_available, logging, replace_example_docstring
32
+ from diffusers.utils import deprecate, logging
33
+ from diffusers.utils.torch_utils import randn_tensor
34
+ import torch
35
+ from transformers import (
36
+ CLIPTextModelWithProjection,
37
+ CLIPTokenizer,
38
+ T5EncoderModel,
39
+ T5TokenizerFast,
40
+ )
41
+
42
+
43
+ if is_torch_xla_available():
44
+ import torch_xla.core.xla_model as xm
45
+ XLA_AVAILABLE = True
46
+ else:
47
+ XLA_AVAILABLE = False
48
+
49
+
50
+ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
51
+
52
+ EXAMPLE_DOC_STRING = """
53
+ Examples:
54
+ ```py
55
+ >>> import torch
56
+ >>> from diffusers import StableDiffusion3Pipeline
57
+
58
+ >>> pipe = StableDiffusion3Pipeline.from_pretrained(
59
+ ... "stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16
60
+ ... )
61
+ >>> pipe.to("cuda")
62
+ >>> prompt = "A cat holding a sign that says hello world"
63
+ >>> image = pipe(prompt).images[0]
64
+ >>> image.save("sd3.png")
65
+ ```
66
+ """
67
+
68
+
69
+ class StableDiffusion3InstructPix2PixPipeline(DiffusionPipeline, SD3LoraLoaderMixin, FromSingleFileMixin):
70
+ r"""
71
+ Args:
72
+ transformer ([`SD3Transformer2DModel`]):
73
+ Conditional Transformer (MMDiT) architecture to denoise the encoded image latents.
74
+ scheduler ([`FlowMatchEulerDiscreteScheduler`]):
75
+ A scheduler to be used in combination with `transformer` to denoise the encoded image latents.
76
+ vae ([`AutoencoderKL`]):
77
+ Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
78
+ text_encoder ([`CLIPTextModelWithProjection`]):
79
+ [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModelWithProjection),
80
+ specifically the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant,
81
+ with an additional added projection layer that is initialized with a diagonal matrix with the `hidden_size`
82
+ as its dimension.
83
+ text_encoder_2 ([`CLIPTextModelWithProjection`]):
84
+ [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModelWithProjection),
85
+ specifically the
86
+ [laion/CLIP-ViT-bigG-14-laion2B-39B-b160k](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k)
87
+ variant.
88
+ text_encoder_3 ([`T5EncoderModel`]):
89
+ Frozen text-encoder. Stable Diffusion 3 uses
90
+ [T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5EncoderModel), specifically the
91
+ [t5-v1_1-xxl](https://huggingface.co/google/t5-v1_1-xxl) variant.
92
+ tokenizer (`CLIPTokenizer`):
93
+ Tokenizer of class
94
+ [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer).
95
+ tokenizer_2 (`CLIPTokenizer`):
96
+ Second Tokenizer of class
97
+ [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer).
98
+ tokenizer_3 (`T5TokenizerFast`):
99
+ Tokenizer of class
100
+ [T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer).
101
+ """
102
+
103
+ model_cpu_offload_seq = "text_encoder->text_encoder_2->text_encoder_3->transformer->vae"
104
+ _optional_components = []
105
+ _callback_tensor_inputs = ["latents", "prompt_embeds", "negative_prompt_embeds", "negative_pooled_prompt_embeds"]
106
+
107
+ def __init__(
108
+ self,
109
+ transformer: SD3Transformer2DModel,
110
+ scheduler: FlowMatchEulerDiscreteScheduler,
111
+ vae: AutoencoderKL,
112
+ text_encoder: CLIPTextModelWithProjection,
113
+ tokenizer: CLIPTokenizer,
114
+ text_encoder_2: CLIPTextModelWithProjection,
115
+ tokenizer_2: CLIPTokenizer,
116
+ text_encoder_3: T5EncoderModel,
117
+ tokenizer_3: T5TokenizerFast,
118
+ ):
119
+ super().__init__()
120
+
121
+ self.register_modules(
122
+ vae=vae,
123
+ text_encoder=text_encoder,
124
+ text_encoder_2=text_encoder_2,
125
+ text_encoder_3=text_encoder_3,
126
+ tokenizer=tokenizer,
127
+ tokenizer_2=tokenizer_2,
128
+ tokenizer_3=tokenizer_3,
129
+ transformer=transformer,
130
+ scheduler=scheduler,
131
+ )
132
+ self.vae_scale_factor = (
133
+ 2 ** (len(self.vae.config.block_out_channels) - 1) if hasattr(self, "vae") and self.vae is not None else 8
134
+ )
135
+ self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
136
+ self.tokenizer_max_length = (
137
+ self.tokenizer.model_max_length if hasattr(self, "tokenizer") and self.tokenizer is not None else 77
138
+ )
139
+ self.default_sample_size = (
140
+ self.transformer.config.sample_size
141
+ if hasattr(self, "transformer") and self.transformer is not None
142
+ else 128
143
+ )
144
+
145
+ def _get_t5_prompt_embeds(
146
+ self,
147
+ prompt: Union[str, List[str]] = None,
148
+ num_images_per_prompt: int = 1,
149
+ device: Optional[torch.device] = None,
150
+ dtype: Optional[torch.dtype] = None,
151
+ ):
152
+ device = device or self._execution_device
153
+ dtype = dtype or self.text_encoder.dtype
154
+
155
+ prompt = [prompt] if isinstance(prompt, str) else prompt
156
+ batch_size = len(prompt)
157
+
158
+ if self.text_encoder_3 is None:
159
+ return torch.zeros(
160
+ (batch_size, self.tokenizer_max_length, self.transformer.config.joint_attention_dim),
161
+ device=device,
162
+ dtype=dtype,
163
+ )
164
+
165
+ text_inputs = self.tokenizer_3(
166
+ prompt,
167
+ padding="max_length",
168
+ max_length=self.tokenizer_max_length,
169
+ truncation=True,
170
+ add_special_tokens=True,
171
+ return_tensors="pt",
172
+ )
173
+ text_input_ids = text_inputs.input_ids
174
+ untruncated_ids = self.tokenizer_3(prompt, padding="longest", return_tensors="pt").input_ids
175
+
176
+ if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(text_input_ids, untruncated_ids):
177
+ removed_text = self.tokenizer_3.batch_decode(untruncated_ids[:, self.tokenizer_max_length - 1 : -1])
178
+ logger.warning(
179
+ "The following part of your input was truncated because CLIP can only handle sequences up to"
180
+ f" {self.tokenizer_max_length} tokens: {removed_text}"
181
+ )
182
+
183
+ prompt_embeds = self.text_encoder_3(text_input_ids.to(device))[0]
184
+
185
+ dtype = self.text_encoder_3.dtype
186
+ prompt_embeds = prompt_embeds.to(dtype=dtype, device=device)
187
+
188
+ _, seq_len, _ = prompt_embeds.shape
189
+
190
+ # duplicate text embeddings and attention mask for each generation per prompt, using mps friendly method
191
+ prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
192
+ prompt_embeds = prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
193
+
194
+ return prompt_embeds
195
+
196
+ def _get_clip_prompt_embeds(
197
+ self,
198
+ prompt: Union[str, List[str]],
199
+ num_images_per_prompt: int = 1,
200
+ device: Optional[torch.device] = None,
201
+ clip_skip: Optional[int] = None,
202
+ clip_model_index: int = 0,
203
+ ):
204
+ device = device or self._execution_device
205
+
206
+ clip_tokenizers = [self.tokenizer, self.tokenizer_2]
207
+ clip_text_encoders = [self.text_encoder, self.text_encoder_2]
208
+
209
+ tokenizer = clip_tokenizers[clip_model_index]
210
+ text_encoder = clip_text_encoders[clip_model_index]
211
+
212
+ prompt = [prompt] if isinstance(prompt, str) else prompt
213
+ batch_size = len(prompt)
214
+
215
+ text_inputs = tokenizer(
216
+ prompt,
217
+ padding="max_length",
218
+ max_length=self.tokenizer_max_length,
219
+ truncation=True,
220
+ return_tensors="pt",
221
+ )
222
+
223
+ text_input_ids = text_inputs.input_ids
224
+ untruncated_ids = tokenizer(prompt, padding="longest", return_tensors="pt").input_ids
225
+ if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(text_input_ids, untruncated_ids):
226
+ removed_text = tokenizer.batch_decode(untruncated_ids[:, self.tokenizer_max_length - 1 : -1])
227
+ logger.warning(
228
+ "The following part of your input was truncated because CLIP can only handle sequences up to"
229
+ f" {self.tokenizer_max_length} tokens: {removed_text}"
230
+ )
231
+ prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True)
232
+ pooled_prompt_embeds = prompt_embeds[0]
233
+
234
+ if clip_skip is None:
235
+ prompt_embeds = prompt_embeds.hidden_states[-2]
236
+ else:
237
+ prompt_embeds = prompt_embeds.hidden_states[-(clip_skip + 2)]
238
+
239
+ prompt_embeds = prompt_embeds.to(dtype=self.text_encoder.dtype, device=device)
240
+
241
+ _, seq_len, _ = prompt_embeds.shape
242
+ # duplicate text embeddings for each generation per prompt, using mps friendly method
243
+ prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
244
+ prompt_embeds = prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
245
+
246
+ pooled_prompt_embeds = pooled_prompt_embeds.repeat(1, num_images_per_prompt, 1)
247
+ pooled_prompt_embeds = pooled_prompt_embeds.view(batch_size * num_images_per_prompt, -1)
248
+
249
+ return prompt_embeds, pooled_prompt_embeds
250
+
251
+ def encode_prompt(
252
+ self,
253
+ prompt: Union[str, List[str]],
254
+ prompt_2: Union[str, List[str]],
255
+ prompt_3: Union[str, List[str]],
256
+ device: Optional[torch.device] = None,
257
+ num_images_per_prompt: int = 1,
258
+ do_classifier_free_guidance: bool = True,
259
+ negative_prompt: Optional[Union[str, List[str]]] = None,
260
+ negative_prompt_2: Optional[Union[str, List[str]]] = None,
261
+ negative_prompt_3: Optional[Union[str, List[str]]] = None,
262
+ prompt_embeds: Optional[torch.FloatTensor] = None,
263
+ negative_prompt_embeds: Optional[torch.FloatTensor] = None,
264
+ pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
265
+ negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
266
+ clip_skip: Optional[int] = None,
267
+ ):
268
+ r"""
269
+
270
+ Args:
271
+ prompt (`str` or `List[str]`, *optional*):
272
+ prompt to be encoded
273
+ prompt_2 (`str` or `List[str]`, *optional*):
274
+ The prompt or prompts to be sent to the `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is
275
+ used in all text-encoders
276
+ prompt_3 (`str` or `List[str]`, *optional*):
277
+ The prompt or prompts to be sent to the `tokenizer_3` and `text_encoder_3`. If not defined, `prompt` is
278
+ used in all text-encoders
279
+ device: (`torch.device`):
280
+ torch device
281
+ num_images_per_prompt (`int`):
282
+ number of images that should be generated per prompt
283
+ do_classifier_free_guidance (`bool`):
284
+ whether to use classifier free guidance or not
285
+ negative_prompt (`str` or `List[str]`, *optional*):
286
+ The prompt or prompts not to guide the image generation. If not defined, one has to pass
287
+ `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
288
+ less than `1`).
289
+ negative_prompt_2 (`str` or `List[str]`, *optional*):
290
+ The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
291
+ `text_encoder_2`. If not defined, `negative_prompt` is used in all the text-encoders.
292
+ negative_prompt_2 (`str` or `List[str]`, *optional*):
293
+ The prompt or prompts not to guide the image generation to be sent to `tokenizer_3` and
294
+ `text_encoder_3`. If not defined, `negative_prompt` is used in both text-encoders
295
+ prompt_embeds (`torch.FloatTensor`, *optional*):
296
+ Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
297
+ provided, text embeddings will be generated from `prompt` input argument.
298
+ negative_prompt_embeds (`torch.FloatTensor`, *optional*):
299
+ Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
300
+ weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
301
+ argument.
302
+ pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
303
+ Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
304
+ If not provided, pooled text embeddings will be generated from `prompt` input argument.
305
+ negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
306
+ Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
307
+ weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
308
+ input argument.
309
+ clip_skip (`int`, *optional*):
310
+ Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that
311
+ the output of the pre-final layer will be used for computing the prompt embeddings.
312
+ """
313
+ device = device or self._execution_device
314
+
315
+ prompt = [prompt] if isinstance(prompt, str) else prompt
316
+ if prompt is not None:
317
+ batch_size = len(prompt)
318
+ else:
319
+ batch_size = prompt_embeds.shape[0]
320
+
321
+ if prompt_embeds is None:
322
+ prompt_2 = prompt_2 or prompt
323
+ prompt_2 = [prompt_2] if isinstance(prompt_2, str) else prompt_2
324
+
325
+ prompt_3 = prompt_3 or prompt
326
+ prompt_3 = [prompt_3] if isinstance(prompt_3, str) else prompt_3
327
+
328
+ prompt_embed, pooled_prompt_embed = self._get_clip_prompt_embeds(
329
+ prompt=prompt,
330
+ device=device,
331
+ num_images_per_prompt=num_images_per_prompt,
332
+ clip_skip=clip_skip,
333
+ clip_model_index=0,
334
+ )
335
+ prompt_2_embed, pooled_prompt_2_embed = self._get_clip_prompt_embeds(
336
+ prompt=prompt_2,
337
+ device=device,
338
+ num_images_per_prompt=num_images_per_prompt,
339
+ clip_skip=clip_skip,
340
+ clip_model_index=1,
341
+ )
342
+ clip_prompt_embeds = torch.cat([prompt_embed, prompt_2_embed], dim=-1)
343
+
344
+ t5_prompt_embed = self._get_t5_prompt_embeds(
345
+ prompt=prompt_3,
346
+ num_images_per_prompt=num_images_per_prompt,
347
+ device=device,
348
+ )
349
+
350
+ clip_prompt_embeds = torch.nn.functional.pad(
351
+ clip_prompt_embeds, (0, t5_prompt_embed.shape[-1] - clip_prompt_embeds.shape[-1])
352
+ )
353
+
354
+ prompt_embeds = torch.cat([clip_prompt_embeds, t5_prompt_embed], dim=-2)
355
+ pooled_prompt_embeds = torch.cat([pooled_prompt_embed, pooled_prompt_2_embed], dim=-1)
356
+
357
+ if do_classifier_free_guidance and negative_prompt_embeds is None:
358
+ negative_prompt = negative_prompt or ""
359
+ negative_prompt_2 = negative_prompt_2 or negative_prompt
360
+ negative_prompt_3 = negative_prompt_3 or negative_prompt
361
+
362
+ # normalize str to list
363
+ negative_prompt = batch_size * [negative_prompt] if isinstance(negative_prompt, str) else negative_prompt
364
+ negative_prompt_2 = (
365
+ batch_size * [negative_prompt_2] if isinstance(negative_prompt_2, str) else negative_prompt_2
366
+ )
367
+ negative_prompt_3 = (
368
+ batch_size * [negative_prompt_3] if isinstance(negative_prompt_3, str) else negative_prompt_3
369
+ )
370
+
371
+ if prompt is not None and type(prompt) is not type(negative_prompt):
372
+ raise TypeError(
373
+ f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !="
374
+ f" {type(prompt)}."
375
+ )
376
+ elif batch_size != len(negative_prompt):
377
+ raise ValueError(
378
+ f"`negative_prompt`: {negative_prompt} has batch size {len(negative_prompt)}, but `prompt`:"
379
+ f" {prompt} has batch size {batch_size}. Please make sure that passed `negative_prompt` matches"
380
+ " the batch size of `prompt`."
381
+ )
382
+
383
+ negative_prompt_embed, negative_pooled_prompt_embed = self._get_clip_prompt_embeds(
384
+ negative_prompt,
385
+ device=device,
386
+ num_images_per_prompt=num_images_per_prompt,
387
+ clip_skip=None,
388
+ clip_model_index=0,
389
+ )
390
+ negative_prompt_2_embed, negative_pooled_prompt_2_embed = self._get_clip_prompt_embeds(
391
+ negative_prompt_2,
392
+ device=device,
393
+ num_images_per_prompt=num_images_per_prompt,
394
+ clip_skip=None,
395
+ clip_model_index=1,
396
+ )
397
+ negative_clip_prompt_embeds = torch.cat([negative_prompt_embed, negative_prompt_2_embed], dim=-1)
398
+
399
+ t5_negative_prompt_embed = self._get_t5_prompt_embeds(
400
+ prompt=negative_prompt_3, num_images_per_prompt=num_images_per_prompt, device=device
401
+ )
402
+
403
+ negative_clip_prompt_embeds = torch.nn.functional.pad(
404
+ negative_clip_prompt_embeds,
405
+ (0, t5_negative_prompt_embed.shape[-1] - negative_clip_prompt_embeds.shape[-1]),
406
+ )
407
+
408
+ negative_prompt_embeds = torch.cat([negative_clip_prompt_embeds, t5_negative_prompt_embed], dim=-2)
409
+ negative_pooled_prompt_embeds = torch.cat(
410
+ [negative_pooled_prompt_embed, negative_pooled_prompt_2_embed], dim=-1
411
+ )
412
+
413
+ return prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds
414
+
415
+ def check_inputs(
416
+ self,
417
+ prompt,
418
+ prompt_2,
419
+ prompt_3,
420
+ # height,
421
+ # width,
422
+ negative_prompt=None,
423
+ negative_prompt_2=None,
424
+ negative_prompt_3=None,
425
+ prompt_embeds=None,
426
+ negative_prompt_embeds=None,
427
+ pooled_prompt_embeds=None,
428
+ negative_pooled_prompt_embeds=None,
429
+ callback_on_step_end_tensor_inputs=None,
430
+ ):
431
+ # if height % 8 != 0 or width % 8 != 0:
432
+ # raise ValueError(f"`height` and `width` have to be divisible by 8 but are {height} and {width}.")
433
+
434
+ if callback_on_step_end_tensor_inputs is not None and not all(
435
+ k in self._callback_tensor_inputs for k in callback_on_step_end_tensor_inputs
436
+ ):
437
+ raise ValueError(
438
+ f"`callback_on_step_end_tensor_inputs` has to be in {self._callback_tensor_inputs}, but found {[k for k in callback_on_step_end_tensor_inputs if k not in self._callback_tensor_inputs]}"
439
+ )
440
+
441
+ if prompt is not None and prompt_embeds is not None:
442
+ raise ValueError(
443
+ f"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
444
+ " only forward one of the two."
445
+ )
446
+ elif prompt_2 is not None and prompt_embeds is not None:
447
+ raise ValueError(
448
+ f"Cannot forward both `prompt_2`: {prompt_2} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
449
+ " only forward one of the two."
450
+ )
451
+ elif prompt_3 is not None and prompt_embeds is not None:
452
+ raise ValueError(
453
+ f"Cannot forward both `prompt_3`: {prompt_2} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
454
+ " only forward one of the two."
455
+ )
456
+ elif prompt is None and prompt_embeds is None:
457
+ raise ValueError(
458
+ "Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined."
459
+ )
460
+ elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
461
+ raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
462
+ elif prompt_2 is not None and (not isinstance(prompt_2, str) and not isinstance(prompt_2, list)):
463
+ raise ValueError(f"`prompt_2` has to be of type `str` or `list` but is {type(prompt_2)}")
464
+ elif prompt_3 is not None and (not isinstance(prompt_3, str) and not isinstance(prompt_3, list)):
465
+ raise ValueError(f"`prompt_3` has to be of type `str` or `list` but is {type(prompt_3)}")
466
+
467
+ if negative_prompt is not None and negative_prompt_embeds is not None:
468
+ raise ValueError(
469
+ f"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:"
470
+ f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
471
+ )
472
+ elif negative_prompt_2 is not None and negative_prompt_embeds is not None:
473
+ raise ValueError(
474
+ f"Cannot forward both `negative_prompt_2`: {negative_prompt_2} and `negative_prompt_embeds`:"
475
+ f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
476
+ )
477
+ elif negative_prompt_3 is not None and negative_prompt_embeds is not None:
478
+ raise ValueError(
479
+ f"Cannot forward both `negative_prompt_3`: {negative_prompt_3} and `negative_prompt_embeds`:"
480
+ f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
481
+ )
482
+
483
+ if prompt_embeds is not None and negative_prompt_embeds is not None:
484
+ if prompt_embeds.shape != negative_prompt_embeds.shape:
485
+ raise ValueError(
486
+ "`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but"
487
+ f" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`"
488
+ f" {negative_prompt_embeds.shape}."
489
+ )
490
+
491
+ if prompt_embeds is not None and pooled_prompt_embeds is None:
492
+ raise ValueError(
493
+ "If `prompt_embeds` are provided, `pooled_prompt_embeds` also have to be passed. Make sure to generate `pooled_prompt_embeds` from the same text encoder that was used to generate `prompt_embeds`."
494
+ )
495
+
496
+ if negative_prompt_embeds is not None and negative_pooled_prompt_embeds is None:
497
+ raise ValueError(
498
+ "If `negative_prompt_embeds` are provided, `negative_pooled_prompt_embeds` also have to be passed. Make sure to generate `negative_pooled_prompt_embeds` from the same text encoder that was used to generate `negative_prompt_embeds`."
499
+ )
500
+
501
+ def get_timesteps(self, num_inference_steps, strength, device):
502
+ # get the original timestep using init_timestep
503
+ init_timestep = min(num_inference_steps * strength, num_inference_steps)
504
+
505
+ t_start = int(max(num_inference_steps - init_timestep, 0))
506
+ timesteps = self.scheduler.timesteps[t_start * self.scheduler.order :]
507
+ if hasattr(self.scheduler, "set_begin_index"):
508
+ self.scheduler.set_begin_index(t_start * self.scheduler.order)
509
+
510
+ return timesteps, num_inference_steps - t_start
511
+
512
+ def prepare_latents(self, image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None):
513
+ if not isinstance(image, (torch.Tensor, PIL.Image.Image, list)):
514
+ raise ValueError(
515
+ f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or list but is {type(image)}"
516
+ )
517
+
518
+ image = image.to(device=device, dtype=dtype)
519
+
520
+ batch_size = batch_size * num_images_per_prompt
521
+ if image.shape[1] == self.vae.config.latent_channels:
522
+ init_latents = image
523
+
524
+ else:
525
+ if isinstance(generator, list) and len(generator) != batch_size:
526
+ raise ValueError(
527
+ f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
528
+ f" size of {batch_size}. Make sure the batch size matches the length of the generators."
529
+ )
530
+
531
+ elif isinstance(generator, list):
532
+ init_latents = [
533
+ retrieve_latents(self.vae.encode(image[i : i + 1]), generator=generator[i])
534
+ for i in range(batch_size)
535
+ ]
536
+ init_latents = torch.cat(init_latents, dim=0)
537
+ else:
538
+ init_latents = retrieve_latents(self.vae.encode(image), generator=generator)
539
+
540
+ init_latents = (init_latents - self.vae.config.shift_factor) * self.vae.config.scaling_factor
541
+
542
+ if batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] == 0:
543
+ # expand init_latents for batch_size
544
+ additional_image_per_prompt = batch_size // init_latents.shape[0]
545
+ init_latents = torch.cat([init_latents] * additional_image_per_prompt, dim=0)
546
+ elif batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] != 0:
547
+ raise ValueError(
548
+ f"Cannot duplicate `image` of batch size {init_latents.shape[0]} to {batch_size} text prompts."
549
+ )
550
+ else:
551
+ init_latents = torch.cat([init_latents], dim=0)
552
+
553
+ shape = init_latents.shape
554
+ noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
555
+
556
+ # get latents
557
+ init_latents = self.scheduler.scale_noise(init_latents, timestep, noise)
558
+ latents = init_latents.to(device=device, dtype=dtype)
559
+
560
+ return latents
561
+
562
+ def prepare_image_latents(
563
+ self, image, batch_size, num_images_per_prompt, dtype, device, do_classifier_free_guidance, generator=None
564
+ ):
565
+ if not isinstance(image, (torch.Tensor, PIL.Image.Image, list)):
566
+ raise ValueError(
567
+ f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or list but is {type(image)}"
568
+ )
569
+
570
+ image = image.to(device=device, dtype=dtype)
571
+
572
+ batch_size = batch_size * num_images_per_prompt
573
+
574
+ if image.shape[1] == self.vae.config.latent_channels:
575
+ image_latents = image
576
+ else:
577
+ image_latents = retrieve_latents(self.vae.encode(image), sample_mode="argmax")
578
+ # ? normalize image latents
579
+ # image_latents = (image_latents - self.vae.config.shift_factor) * self.vae.config.scaling_factor
580
+
581
+ if batch_size > image_latents.shape[0] and batch_size % image_latents.shape[0] == 0:
582
+ # expand image_latents for batch_size
583
+ deprecation_message = (
584
+ f"You have passed {batch_size} text prompts (`prompt`), but only {image_latents.shape[0]} initial"
585
+ " images (`image`). Initial images are now duplicating to match the number of text prompts. Note"
586
+ " that this behavior is deprecated and will be removed in a version 1.0.0. Please make sure to update"
587
+ " your script to pass as many initial images as text prompts to suppress this warning."
588
+ )
589
+ deprecate("len(prompt) != len(image)", "1.0.0", deprecation_message, standard_warn=False)
590
+ additional_image_per_prompt = batch_size // image_latents.shape[0]
591
+ image_latents = torch.cat([image_latents] * additional_image_per_prompt, dim=0)
592
+ elif batch_size > image_latents.shape[0] and batch_size % image_latents.shape[0] != 0:
593
+ raise ValueError(
594
+ f"Cannot duplicate `image` of batch size {image_latents.shape[0]} to {batch_size} text prompts."
595
+ )
596
+ else:
597
+ image_latents = torch.cat([image_latents], dim=0)
598
+
599
+ if do_classifier_free_guidance:
600
+ uncond_image_latents = torch.zeros_like(image_latents)
601
+ image_latents = torch.cat([image_latents, image_latents, uncond_image_latents], dim=0)
602
+
603
+ return image_latents
604
+
605
+
606
+
607
+
608
+
609
+
610
+
611
+
612
+ @property
613
+ def guidance_scale(self):
614
+ return self._guidance_scale
615
+ @property
616
+ def image_guidance_scale(self):
617
+ return self._image_guidance_scale
618
+
619
+ @property
620
+ def clip_skip(self):
621
+ return self._clip_skip
622
+
623
+ # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
624
+ # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
625
+ # corresponds to doing no classifier free guidance.
626
+ @property
627
+ def do_classifier_free_guidance(self):
628
+ return self.guidance_scale > 1.0 and self.image_guidance_scale >= 1.0
629
+
630
+ @property
631
+ def joint_attention_kwargs(self):
632
+ return self._joint_attention_kwargs
633
+
634
+ @property
635
+ def num_timesteps(self):
636
+ return self._num_timesteps
637
+
638
+ @property
639
+ def interrupt(self):
640
+ return self._interrupt
641
+
642
+ @torch.no_grad()
643
+ @replace_example_docstring(EXAMPLE_DOC_STRING)
644
+ def __call__(
645
+ self,
646
+ prompt: Union[str, List[str]] = None,
647
+ prompt_2: Optional[Union[str, List[str]]] = None,
648
+ prompt_3: Optional[Union[str, List[str]]] = None,
649
+ strength: float = 1.0,
650
+ image: PipelineImageInput = None,
651
+ height: Optional[int] = None,
652
+ width: Optional[int] = None,
653
+ num_inference_steps: int = 28,
654
+ timesteps: List[int] = None,
655
+ guidance_scale: float = 7.0,
656
+ image_guidance_scale: float = 1.5,
657
+ negative_prompt: Optional[Union[str, List[str]]] = None,
658
+ negative_prompt_2: Optional[Union[str, List[str]]] = None,
659
+ negative_prompt_3: Optional[Union[str, List[str]]] = None,
660
+ num_images_per_prompt: Optional[int] = 1,
661
+ generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
662
+ latents: Optional[torch.FloatTensor] = None,
663
+ prompt_embeds: Optional[torch.FloatTensor] = None,
664
+ negative_prompt_embeds: Optional[torch.FloatTensor] = None,
665
+ pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
666
+ negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
667
+ output_type: Optional[str] = "pil",
668
+ return_dict: bool = True,
669
+ joint_attention_kwargs: Optional[Dict[str, Any]] = None,
670
+ clip_skip: Optional[int] = None,
671
+ callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
672
+ callback_on_step_end_tensor_inputs: List[str] = ["latents"],
673
+ mask_img: Optional[PipelineImageInput] = None,
674
+ **kwargs
675
+ ):
676
+ r"""
677
+ Function invoked when calling the pipeline for generation.
678
+
679
+ Args:
680
+ prompt (`str` or `List[str]`, *optional*):
681
+ The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
682
+ instead.
683
+ prompt_2 (`str` or `List[str]`, *optional*):
684
+ The prompt or prompts to be sent to `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is
685
+ will be used instead
686
+ prompt_3 (`str` or `List[str]`, *optional*):
687
+ The prompt or prompts to be sent to `tokenizer_3` and `text_encoder_3`. If not defined, `prompt` is
688
+ will be used instead
689
+ height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
690
+ The height in pixels of the generated image. This is set to 1024 by default for the best results.
691
+ width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
692
+ The width in pixels of the generated image. This is set to 1024 by default for the best results.
693
+ num_inference_steps (`int`, *optional*, defaults to 50):
694
+ The number of denoising steps. More denoising steps usually lead to a higher quality image at the
695
+ expense of slower inference.
696
+ timesteps (`List[int]`, *optional*):
697
+ Custom timesteps to use for the denoising process with schedulers which support a `timesteps` argument
698
+ in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
699
+ passed will be used. Must be in descending order.
700
+ guidance_scale (`float`, *optional*, defaults to 5.0):
701
+ Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
702
+ `guidance_scale` is defined as `w` of equation 2. of [Imagen
703
+ Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
704
+ 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
705
+ usually at the expense of lower image quality.
706
+ negative_prompt (`str` or `List[str]`, *optional*):
707
+ The prompt or prompts not to guide the image generation. If not defined, one has to pass
708
+ `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
709
+ less than `1`).
710
+ negative_prompt_2 (`str` or `List[str]`, *optional*):
711
+ The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
712
+ `text_encoder_2`. If not defined, `negative_prompt` is used instead
713
+ negative_prompt_3 (`str` or `List[str]`, *optional*):
714
+ The prompt or prompts not to guide the image generation to be sent to `tokenizer_3` and
715
+ `text_encoder_3`. If not defined, `negative_prompt` is used instead
716
+ num_images_per_prompt (`int`, *optional*, defaults to 1):
717
+ The number of images to generate per prompt.
718
+ generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
719
+ One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
720
+ to make generation deterministic.
721
+ latents (`torch.FloatTensor`, *optional*):
722
+ Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
723
+ generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
724
+ tensor will ge generated by sampling using the supplied random `generator`.
725
+ prompt_embeds (`torch.FloatTensor`, *optional*):
726
+ Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
727
+ provided, text embeddings will be generated from `prompt` input argument.
728
+ negative_prompt_embeds (`torch.FloatTensor`, *optional*):
729
+ Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
730
+ weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
731
+ argument.
732
+ pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
733
+ Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
734
+ If not provided, pooled text embeddings will be generated from `prompt` input argument.
735
+ negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
736
+ Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
737
+ weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
738
+ input argument.
739
+ output_type (`str`, *optional*, defaults to `"pil"`):
740
+ The output format of the generate image. Choose between
741
+ [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
742
+ return_dict (`bool`, *optional*, defaults to `True`):
743
+ Whether or not to return a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] instead
744
+ of a plain tuple.
745
+ joint_attention_kwargs (`dict`, *optional*):
746
+ A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
747
+ `self.processor` in
748
+ [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
749
+ callback_on_step_end (`Callable`, *optional*):
750
+ A function that calls at the end of each denoising steps during the inference. The function is called
751
+ with the following arguments: `callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int,
752
+ callback_kwargs: Dict)`. `callback_kwargs` will include a list of all tensors as specified by
753
+ `callback_on_step_end_tensor_inputs`.
754
+ callback_on_step_end_tensor_inputs (`List`, *optional*):
755
+ The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
756
+ will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
757
+ `._callback_tensor_inputs` attribute of your pipeline class.
758
+
759
+ Examples:
760
+
761
+ Returns:
762
+ [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`:
763
+ [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] if `return_dict` is True, otherwise a
764
+ `tuple`. When returning a tuple, the first element is a list with the generated images.
765
+ """
766
+
767
+ # height = height or self.default_sample_size * self.vae_scale_factor
768
+ # width = width or self.default_sample_size * self.vae_scale_factor
769
+
770
+ # 1. Check inputs. Raise error if not correct
771
+ self.check_inputs(
772
+ prompt,
773
+ prompt_2,
774
+ prompt_3,
775
+ negative_prompt=negative_prompt,
776
+ negative_prompt_2=negative_prompt_2,
777
+ negative_prompt_3=negative_prompt_3,
778
+ prompt_embeds=prompt_embeds,
779
+ negative_prompt_embeds=negative_prompt_embeds,
780
+ pooled_prompt_embeds=pooled_prompt_embeds,
781
+ negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
782
+ callback_on_step_end_tensor_inputs=callback_on_step_end_tensor_inputs,
783
+ )
784
+
785
+ self._guidance_scale = guidance_scale
786
+ self._image_guidance_scale = image_guidance_scale
787
+ self._clip_skip = clip_skip
788
+ self._joint_attention_kwargs = joint_attention_kwargs
789
+ self._interrupt = False
790
+
791
+ # 2. Define call parameters
792
+ if prompt is not None and isinstance(prompt, str):
793
+ batch_size = 1
794
+ elif prompt is not None and isinstance(prompt, list):
795
+ batch_size = len(prompt)
796
+ else:
797
+ batch_size = prompt_embeds.shape[0]
798
+
799
+ device = self._execution_device
800
+
801
+ (
802
+ prompt_embeds,
803
+ negative_prompt_embeds,
804
+ pooled_prompt_embeds,
805
+ negative_pooled_prompt_embeds,
806
+ ) = self.encode_prompt(
807
+ prompt=prompt,
808
+ prompt_2=prompt_2,
809
+ prompt_3=prompt_3,
810
+ negative_prompt=negative_prompt,
811
+ negative_prompt_2=negative_prompt_2,
812
+ negative_prompt_3=negative_prompt_3,
813
+ do_classifier_free_guidance=self.do_classifier_free_guidance,
814
+ prompt_embeds=prompt_embeds,
815
+ negative_prompt_embeds=negative_prompt_embeds,
816
+ pooled_prompt_embeds=pooled_prompt_embeds,
817
+ negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
818
+ device=device,
819
+ clip_skip=self.clip_skip,
820
+ num_images_per_prompt=num_images_per_prompt,
821
+ )
822
+
823
+ if self.do_classifier_free_guidance:
824
+ # duplicate unconditional embeddings for each generation per prompt, using mps friendly method
825
+ prompt_embeds = torch.cat([prompt_embeds, negative_prompt_embeds, negative_prompt_embeds], dim=0)
826
+
827
+ # Similiarly
828
+ pooled_prompt_embeds = torch.cat([pooled_prompt_embeds, negative_pooled_prompt_embeds, negative_pooled_prompt_embeds], dim=0)
829
+
830
+ # if self.do_classifier_free_guidance:
831
+ # prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
832
+ # pooled_prompt_embeds = torch.cat([negative_pooled_prompt_embeds, pooled_prompt_embeds], dim=0)
833
+
834
+ # 3. Preprocess image
835
+ image = self.image_processor.preprocess(image)
836
+
837
+ # 4. Prepare timesteps
838
+ timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)
839
+ timesteps, num_inference_steps = self.get_timesteps(num_inference_steps, strength, device)
840
+ latent_timestep = timesteps[:1].repeat(batch_size * num_inference_steps)
841
+
842
+ # timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)
843
+ num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
844
+ self._num_timesteps = len(timesteps)
845
+
846
+ # 5. Prepare Image latent
847
+
848
+ image_latents = self.prepare_image_latents(
849
+ image,
850
+ batch_size,
851
+ num_images_per_prompt,
852
+ prompt_embeds.dtype,
853
+ device,
854
+ self.do_classifier_free_guidance,
855
+ )
856
+
857
+ height, width = image_latents.shape[-2:]
858
+ height = height * self.vae_scale_factor
859
+ width = width * self.vae_scale_factor
860
+ # 6. Prepare latent variables
861
+ num_channels_latents = self.vae.config.latent_channels
862
+ if latents is None:
863
+ latents = self.prepare_latents(
864
+ image,
865
+ latent_timestep,
866
+ batch_size,
867
+ num_images_per_prompt,
868
+ prompt_embeds.dtype,
869
+ device,
870
+ generator,
871
+ )
872
+ else:
873
+ return latents.to(device=device, dtype=prompt_embeds.dtype)
874
+
875
+ # 7. Check that shapes of latents and image match the DIT in_channels
876
+ num_channels_image = image_latents.shape[1]
877
+ if mask_img is not None:
878
+ mask_img = self.image_processor.preprocess(mask_img)
879
+ mask_image_latents = self.prepare_image_latents(
880
+ mask_img,
881
+ batch_size,
882
+ num_images_per_prompt,
883
+ prompt_embeds.dtype,
884
+ device,
885
+ self.do_classifier_free_guidance,
886
+ )
887
+ num_channels_image += mask_image_latents.shape[1]
888
+
889
+ if num_channels_latents + num_channels_image != self.transformer.config.in_channels:
890
+ raise ValueError(
891
+ f"Incorrect configuration settings! The config of `pipeline.transformer`: {self.transformer.config} expects"
892
+ f" {self.transformer.config.in_channels} but received `num_channels_latents`: {num_channels_latents} +"
893
+ f" `num_channels_image`: {num_channels_image} "
894
+ f" = {num_channels_latents+num_channels_image}. Please verify the config of"
895
+ " `pipeline.transformer` or your `image` input."
896
+ )
897
+
898
+ # 8. Denoising loop
899
+ with self.progress_bar(total=num_inference_steps) as progress_bar:
900
+ for i, t in enumerate(timesteps):
901
+ if self.interrupt:
902
+ continue
903
+
904
+ # expand the latents if we are doing classifier free guidance
905
+ latent_model_input = torch.cat([latents] * 3) if self.do_classifier_free_guidance else latents
906
+ # broadcast to batch dimension in a way that's compatible with ONNX/Core ML
907
+ timestep = t.expand(latent_model_input.shape[0])
908
+
909
+ scaled_latent_model_input = torch.cat([latent_model_input, image_latents], dim=1)
910
+ if mask_img is not None:
911
+ scaled_latent_model_input = torch.cat([scaled_latent_model_input, mask_image_latents], dim=1)
912
+ # if "mask_index" in kwargs and kwargs['mask_index'] is not None:
913
+ # mask_index = kwargs['mask_index']
914
+ # else:
915
+ # mask_index = None
916
+ noise_pred = self.transformer(
917
+ hidden_states=scaled_latent_model_input,
918
+ timestep=timestep,
919
+ encoder_hidden_states=prompt_embeds,
920
+ pooled_projections=pooled_prompt_embeds,
921
+ joint_attention_kwargs=self.joint_attention_kwargs,
922
+ return_dict=False,
923
+ # mask_index= mask_index,
924
+ )[0]
925
+
926
+ # perform guidance
927
+ if self.do_classifier_free_guidance:
928
+ noise_pred_text, noise_pred_image, noise_pred_uncond = noise_pred.chunk(3)
929
+ noise_pred = (
930
+ noise_pred_uncond
931
+ + self.guidance_scale * (noise_pred_text - noise_pred_image)
932
+ + self.image_guidance_scale * (noise_pred_image - noise_pred_uncond)
933
+ )
934
+ # noise_pred_uncond, noise_pred_text = noise_pred.chunk(2) # neg, prompt
935
+ # noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
936
+
937
+ # compute the previous noisy sample x_t -> x_t-1
938
+ latents_dtype = latents.dtype
939
+ latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0]
940
+
941
+ if latents.dtype != latents_dtype:
942
+ if torch.backends.mps.is_available():
943
+ # some platforms (eg. apple mps) misbehave due to a pytorch bug: https://github.com/pytorch/pytorch/pull/99272
944
+ latents = latents.to(latents_dtype)
945
+
946
+ if callback_on_step_end is not None:
947
+ callback_kwargs = {}
948
+ for k in callback_on_step_end_tensor_inputs:
949
+ callback_kwargs[k] = locals()[k]
950
+ callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)
951
+
952
+ latents = callback_outputs.pop("latents", latents)
953
+ prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
954
+ negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
955
+ negative_pooled_prompt_embeds = callback_outputs.pop(
956
+ "negative_pooled_prompt_embeds", negative_pooled_prompt_embeds
957
+ )
958
+ image_latents = callback_outputs.pop("image_latents", image_latents)
959
+ if mask_img is not None:
960
+ mask_image_latents = callback_outputs.pop("mask_image_latents", mask_image_latents)
961
+ # call the callback, if provided
962
+ if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
963
+ progress_bar.update()
964
+
965
+ if XLA_AVAILABLE:
966
+ xm.mark_step()
967
+
968
+ if output_type == "latent":
969
+ image = latents
970
+
971
+ else:
972
+ # latents = (latents / self.vae.config.scaling_factor) + self.vae.config.shift_factor
973
+ latents = latents / self.vae.config.scaling_factor
974
+ image = self.vae.decode(latents, return_dict=False)[0]
975
+ image = self.image_processor.postprocess(image, output_type=output_type)
976
+
977
+ # Offload all models
978
+ self.maybe_free_model_hooks()
979
+
980
+ if not return_dict:
981
+ return (image,)
982
+
983
+ return StableDiffusion3PipelineOutput(images=image)
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "FlowMatchEulerDiscreteScheduler",
3
+ "_diffusers_version": "0.30.1",
4
+ "num_train_timesteps": 1000,
5
+ "shift": 3.0
6
+ }
scheduler/scheduling_flow_match_euler_discrete.py ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Stability AI, Katherine Crowson and The HuggingFace Team. All rights reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ from dataclasses import dataclass
16
+ from typing import Optional, Tuple, Union
17
+
18
+ import numpy as np
19
+ import torch
20
+
21
+ from diffusers.configuration_utils import ConfigMixin, register_to_config
22
+ from diffusers.utils import BaseOutput, logging
23
+ from diffusers.utils.torch_utils import randn_tensor
24
+ from diffusers.schedulers.scheduling_utils import SchedulerMixin
25
+
26
+
27
+ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
28
+
29
+
30
+ @dataclass
31
+ class FlowMatchEulerDiscreteSchedulerOutput(BaseOutput):
32
+ """
33
+ Output class for the scheduler's `step` function output.
34
+
35
+ Args:
36
+ prev_sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)` for images):
37
+ Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the
38
+ denoising loop.
39
+ """
40
+
41
+ prev_sample: torch.FloatTensor
42
+
43
+
44
+ class FlowMatchEulerDiscreteScheduler(SchedulerMixin, ConfigMixin):
45
+ """
46
+ Euler scheduler.
47
+
48
+ This model inherits from [`SchedulerMixin`] and [`ConfigMixin`]. Check the superclass documentation for the generic
49
+ methods the library implements for all schedulers such as loading and saving.
50
+
51
+ Args:
52
+ num_train_timesteps (`int`, defaults to 1000):
53
+ The number of diffusion steps to train the model.
54
+ timestep_spacing (`str`, defaults to `"linspace"`):
55
+ The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and
56
+ Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information.
57
+ shift (`float`, defaults to 1.0):
58
+ The shift value for the timestep schedule.
59
+ """
60
+
61
+ _compatibles = []
62
+ order = 1
63
+
64
+ @register_to_config
65
+ def __init__(
66
+ self,
67
+ num_train_timesteps: int = 1000,
68
+ shift: float = 1.0,
69
+ ):
70
+ timesteps = np.linspace(1, num_train_timesteps, num_train_timesteps, dtype=np.float32)[::-1].copy()
71
+ timesteps = torch.from_numpy(timesteps).to(dtype=torch.float32)
72
+
73
+ sigmas = timesteps / num_train_timesteps
74
+ sigmas = shift * sigmas / (1 + (shift - 1) * sigmas)
75
+
76
+ self.timesteps = sigmas * num_train_timesteps
77
+
78
+ self._step_index = None
79
+ self._begin_index = None
80
+
81
+ self.sigmas = sigmas.to("cpu") # to avoid too much CPU/GPU communication
82
+ self.sigma_min = self.sigmas[-1].item()
83
+ self.sigma_max = self.sigmas[0].item()
84
+
85
+ @property
86
+ def step_index(self):
87
+ """
88
+ The index counter for current timestep. It will increase 1 after each scheduler step.
89
+ """
90
+ return self._step_index
91
+
92
+ @property
93
+ def begin_index(self):
94
+ """
95
+ The index for the first timestep. It should be set from pipeline with `set_begin_index` method.
96
+ """
97
+ return self._begin_index
98
+
99
+ # Copied from diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler.set_begin_index
100
+ def set_begin_index(self, begin_index: int = 0):
101
+ """
102
+ Sets the begin index for the scheduler. This function should be run from pipeline before the inference.
103
+
104
+ Args:
105
+ begin_index (`int`):
106
+ The begin index for the scheduler.
107
+ """
108
+ self._begin_index = begin_index
109
+
110
+ def scale_noise(
111
+ self,
112
+ sample: torch.FloatTensor,
113
+ timestep: Union[float, torch.FloatTensor],
114
+ noise: Optional[torch.FloatTensor] = None,
115
+ ) -> torch.FloatTensor:
116
+ """
117
+ Forward process in flow-matching
118
+
119
+ Args:
120
+ sample (`torch.FloatTensor`):
121
+ The input sample.
122
+ timestep (`int`, *optional*):
123
+ The current timestep in the diffusion chain.
124
+
125
+ Returns:
126
+ `torch.FloatTensor`:
127
+ A scaled input sample.
128
+ """
129
+ if self.step_index is None:
130
+ self._init_step_index(timestep)
131
+
132
+ sigma = self.sigmas[self.step_index]
133
+ sample = sigma * noise + (1.0 - sigma) * sample
134
+
135
+ return sample
136
+
137
+ def _sigma_to_t(self, sigma):
138
+ return sigma * self.config.num_train_timesteps
139
+
140
+ def set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):
141
+ """
142
+ Sets the discrete timesteps used for the diffusion chain (to be run before inference).
143
+
144
+ Args:
145
+ num_inference_steps (`int`):
146
+ The number of diffusion steps used when generating samples with a pre-trained model.
147
+ device (`str` or `torch.device`, *optional*):
148
+ The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
149
+ """
150
+ self.num_inference_steps = num_inference_steps
151
+
152
+ timesteps = np.linspace(
153
+ self._sigma_to_t(self.sigma_max), self._sigma_to_t(self.sigma_min), num_inference_steps
154
+ )
155
+
156
+ sigmas = timesteps / self.config.num_train_timesteps
157
+ sigmas = self.config.shift * sigmas / (1 + (self.config.shift - 1) * sigmas)
158
+ sigmas = torch.from_numpy(sigmas).to(dtype=torch.float32, device=device)
159
+
160
+ timesteps = sigmas * self.config.num_train_timesteps
161
+ self.timesteps = timesteps.to(device=device)
162
+ self.sigmas = torch.cat([sigmas, torch.zeros(1, device=sigmas.device)])
163
+
164
+ self._step_index = None
165
+ self._begin_index = None
166
+
167
+ def index_for_timestep(self, timestep, schedule_timesteps=None):
168
+ if schedule_timesteps is None:
169
+ schedule_timesteps = self.timesteps
170
+
171
+ indices = (schedule_timesteps == timestep).nonzero()
172
+
173
+ # The sigma index that is taken for the **very** first `step`
174
+ # is always the second index (or the last index if there is only 1)
175
+ # This way we can ensure we don't accidentally skip a sigma in
176
+ # case we start in the middle of the denoising schedule (e.g. for image-to-image)
177
+ pos = 1 if len(indices) > 1 else 0
178
+
179
+ return indices[pos].item()
180
+
181
+ def _init_step_index(self, timestep):
182
+ if self.begin_index is None:
183
+ if isinstance(timestep, torch.Tensor):
184
+ timestep = timestep.to(self.timesteps.device)
185
+ self._step_index = self.index_for_timestep(timestep)
186
+ else:
187
+ self._step_index = self._begin_index
188
+
189
+ def step(
190
+ self,
191
+ model_output: torch.FloatTensor,
192
+ timestep: Union[float, torch.FloatTensor],
193
+ sample: torch.FloatTensor,
194
+ s_churn: float = 0.0,
195
+ s_tmin: float = 0.0,
196
+ s_tmax: float = float("inf"),
197
+ s_noise: float = 1.0,
198
+ generator: Optional[torch.Generator] = None,
199
+ return_dict: bool = True,
200
+ ) -> Union[FlowMatchEulerDiscreteSchedulerOutput, Tuple]:
201
+ """
202
+ Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion
203
+ process from the learned model outputs (most often the predicted noise).
204
+
205
+ Args:
206
+ model_output (`torch.FloatTensor`):
207
+ The direct output from learned diffusion model.
208
+ timestep (`float`):
209
+ The current discrete timestep in the diffusion chain.
210
+ sample (`torch.FloatTensor`):
211
+ A current instance of a sample created by the diffusion process.
212
+ s_churn (`float`):
213
+ s_tmin (`float`):
214
+ s_tmax (`float`):
215
+ s_noise (`float`, defaults to 1.0):
216
+ Scaling factor for noise added to the sample.
217
+ generator (`torch.Generator`, *optional*):
218
+ A random number generator.
219
+ return_dict (`bool`):
220
+ Whether or not to return a [`~schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput`] or
221
+ tuple.
222
+
223
+ Returns:
224
+ [`~schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput`] or `tuple`:
225
+ If return_dict is `True`, [`~schedulers.scheduling_euler_discrete.EulerDiscreteSchedulerOutput`] is
226
+ returned, otherwise a tuple is returned where the first element is the sample tensor.
227
+ """
228
+
229
+ if (
230
+ isinstance(timestep, int)
231
+ or isinstance(timestep, torch.IntTensor)
232
+ or isinstance(timestep, torch.LongTensor)
233
+ ):
234
+ raise ValueError(
235
+ (
236
+ "Passing integer indices (e.g. from `enumerate(timesteps)`) as timesteps to"
237
+ " `EulerDiscreteScheduler.step()` is not supported. Make sure to pass"
238
+ " one of the `scheduler.timesteps` as a timestep."
239
+ ),
240
+ )
241
+
242
+ if self.step_index is None:
243
+ self._init_step_index(timestep)
244
+
245
+ # Upcast to avoid precision issues when computing prev_sample
246
+ sample = sample.to(torch.float32)
247
+
248
+ sigma = self.sigmas[self.step_index]
249
+
250
+ gamma = min(s_churn / (len(self.sigmas) - 1), 2**0.5 - 1) if s_tmin <= sigma <= s_tmax else 0.0
251
+
252
+ noise = randn_tensor(
253
+ model_output.shape, dtype=model_output.dtype, device=model_output.device, generator=generator
254
+ )
255
+
256
+ eps = noise * s_noise
257
+ sigma_hat = sigma * (gamma + 1)
258
+
259
+ if gamma > 0:
260
+ sample = sample + eps * (sigma_hat**2 - sigma**2) ** 0.5
261
+
262
+ # 1. compute predicted original sample (x_0) from sigma-scaled predicted noise
263
+ # NOTE: "original_sample" should not be an expected prediction_type but is left in for
264
+ # backwards compatibility
265
+
266
+ # if self.config.prediction_type == "vector_field":
267
+
268
+ denoised = sample - model_output * sigma
269
+ # 2. Convert to an ODE derivative
270
+ derivative = (sample - denoised) / sigma_hat
271
+
272
+ dt = self.sigmas[self.step_index + 1] - sigma_hat
273
+
274
+ prev_sample = sample + derivative * dt
275
+ # Cast sample back to model compatible dtype
276
+ prev_sample = prev_sample.to(model_output.dtype)
277
+
278
+ # upon completion increase step index by one
279
+ self._step_index += 1
280
+
281
+ if not return_dict:
282
+ return (prev_sample,)
283
+
284
+ return FlowMatchEulerDiscreteSchedulerOutput(prev_sample=prev_sample)
285
+
286
+ def __len__(self):
287
+ return self.config.num_train_timesteps
text_encoder/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "../UltraEdit/resolution_512_model_epoch_2_sd3_5e5/text_encoder",
3
+ "architectures": [
4
+ "CLIPTextModelWithProjection"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dropout": 0.0,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "quick_gelu",
11
+ "hidden_size": 768,
12
+ "initializer_factor": 1.0,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 77,
17
+ "model_type": "clip_text_model",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "pad_token_id": 1,
21
+ "projection_dim": 768,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.44.2",
24
+ "vocab_size": 49408
25
+ }
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71e183d11db0c6b6282a4d9e0abb74125edc8692393e89ed8ee5571005f35cb1
3
+ size 247323896
text_encoder_2/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "../UltraEdit/resolution_512_model_epoch_2_sd3_5e5/text_encoder_2",
3
+ "architectures": [
4
+ "CLIPTextModelWithProjection"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dropout": 0.0,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_size": 1280,
12
+ "initializer_factor": 1.0,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 5120,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 77,
17
+ "model_type": "clip_text_model",
18
+ "num_attention_heads": 20,
19
+ "num_hidden_layers": 32,
20
+ "pad_token_id": 1,
21
+ "projection_dim": 1280,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.44.2",
24
+ "vocab_size": 49408
25
+ }
text_encoder_2/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec310df2af79c318e24d20511b601a591ca8cd4f1fce1d8dff822a356bcdb1f4
3
+ size 1389382176
text_encoder_3/config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "../UltraEdit/resolution_512_model_epoch_2_sd3_5e5/text_encoder_3",
3
+ "architectures": [
4
+ "T5EncoderModel"
5
+ ],
6
+ "classifier_dropout": 0.0,
7
+ "d_ff": 10240,
8
+ "d_kv": 64,
9
+ "d_model": 4096,
10
+ "decoder_start_token_id": 0,
11
+ "dense_act_fn": "gelu_new",
12
+ "dropout_rate": 0.1,
13
+ "eos_token_id": 1,
14
+ "feed_forward_proj": "gated-gelu",
15
+ "initializer_factor": 1.0,
16
+ "is_encoder_decoder": true,
17
+ "is_gated_act": true,
18
+ "layer_norm_epsilon": 1e-06,
19
+ "model_type": "t5",
20
+ "num_decoder_layers": 24,
21
+ "num_heads": 64,
22
+ "num_layers": 24,
23
+ "output_past": true,
24
+ "pad_token_id": 0,
25
+ "relative_attention_max_distance": 128,
26
+ "relative_attention_num_buckets": 32,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "float16",
29
+ "transformers_version": "4.44.2",
30
+ "use_cache": true,
31
+ "vocab_size": 32128
32
+ }
text_encoder_3/model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2806b1cf07fc6eac6c5059811aea4e069d69df34b782a0a85cd6a2b57de48404
3
+ size 4994546896
text_encoder_3/model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51aa7ace7b240403ef440b7387445aee7dd585cd0240d7773567cad5a0f1ed61
3
+ size 4966239920
text_encoder_3/model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:366835891170b04927afee017584da714dfdb74aac0022d742943a46c612cae7
3
+ size 1577127552
text_encoder_3/model.safetensors.index.json ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 11537887232
4
+ },
5
+ "weight_map": {
6
+ "encoder.block.0.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
7
+ "encoder.block.0.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
8
+ "encoder.block.0.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
9
+ "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "model-00001-of-00003.safetensors",
10
+ "encoder.block.0.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
11
+ "encoder.block.0.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
12
+ "encoder.block.0.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
13
+ "encoder.block.0.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
14
+ "encoder.block.0.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
15
+ "encoder.block.0.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
16
+ "encoder.block.1.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
17
+ "encoder.block.1.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
18
+ "encoder.block.1.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
19
+ "encoder.block.1.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
20
+ "encoder.block.1.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
21
+ "encoder.block.1.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
22
+ "encoder.block.1.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
23
+ "encoder.block.1.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
24
+ "encoder.block.1.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
25
+ "encoder.block.10.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
26
+ "encoder.block.10.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
27
+ "encoder.block.10.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
28
+ "encoder.block.10.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
29
+ "encoder.block.10.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
30
+ "encoder.block.10.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
31
+ "encoder.block.10.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
32
+ "encoder.block.10.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
33
+ "encoder.block.10.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
34
+ "encoder.block.11.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
35
+ "encoder.block.11.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
36
+ "encoder.block.11.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
37
+ "encoder.block.11.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
38
+ "encoder.block.11.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
39
+ "encoder.block.11.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
40
+ "encoder.block.11.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
41
+ "encoder.block.11.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
42
+ "encoder.block.11.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
43
+ "encoder.block.12.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
44
+ "encoder.block.12.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
45
+ "encoder.block.12.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
46
+ "encoder.block.12.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
47
+ "encoder.block.12.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
48
+ "encoder.block.12.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
49
+ "encoder.block.12.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
50
+ "encoder.block.12.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
51
+ "encoder.block.12.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
52
+ "encoder.block.13.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
53
+ "encoder.block.13.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
54
+ "encoder.block.13.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
55
+ "encoder.block.13.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
56
+ "encoder.block.13.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
57
+ "encoder.block.13.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
58
+ "encoder.block.13.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
59
+ "encoder.block.13.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
60
+ "encoder.block.13.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
61
+ "encoder.block.14.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
62
+ "encoder.block.14.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
63
+ "encoder.block.14.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
64
+ "encoder.block.14.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
65
+ "encoder.block.14.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
66
+ "encoder.block.14.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
67
+ "encoder.block.14.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
68
+ "encoder.block.14.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
69
+ "encoder.block.14.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
70
+ "encoder.block.15.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
71
+ "encoder.block.15.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
72
+ "encoder.block.15.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
73
+ "encoder.block.15.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
74
+ "encoder.block.15.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
75
+ "encoder.block.15.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
76
+ "encoder.block.15.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
77
+ "encoder.block.15.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
78
+ "encoder.block.15.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
79
+ "encoder.block.16.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
80
+ "encoder.block.16.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
81
+ "encoder.block.16.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
82
+ "encoder.block.16.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
83
+ "encoder.block.16.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
84
+ "encoder.block.16.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
85
+ "encoder.block.16.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
86
+ "encoder.block.16.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
87
+ "encoder.block.16.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
88
+ "encoder.block.17.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
89
+ "encoder.block.17.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
90
+ "encoder.block.17.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
91
+ "encoder.block.17.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
92
+ "encoder.block.17.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
93
+ "encoder.block.17.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
94
+ "encoder.block.17.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
95
+ "encoder.block.17.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
96
+ "encoder.block.17.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
97
+ "encoder.block.18.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
98
+ "encoder.block.18.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
99
+ "encoder.block.18.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
100
+ "encoder.block.18.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
101
+ "encoder.block.18.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
102
+ "encoder.block.18.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
103
+ "encoder.block.18.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
104
+ "encoder.block.18.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
105
+ "encoder.block.18.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
106
+ "encoder.block.19.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
107
+ "encoder.block.19.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
108
+ "encoder.block.19.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
109
+ "encoder.block.19.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
110
+ "encoder.block.19.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
111
+ "encoder.block.19.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
112
+ "encoder.block.19.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
113
+ "encoder.block.19.layer.1.DenseReluDense.wo.weight": "model-00002-of-00003.safetensors",
114
+ "encoder.block.19.layer.1.layer_norm.weight": "model-00002-of-00003.safetensors",
115
+ "encoder.block.2.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
116
+ "encoder.block.2.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
117
+ "encoder.block.2.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
118
+ "encoder.block.2.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
119
+ "encoder.block.2.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
120
+ "encoder.block.2.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
121
+ "encoder.block.2.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
122
+ "encoder.block.2.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
123
+ "encoder.block.2.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
124
+ "encoder.block.20.layer.0.SelfAttention.k.weight": "model-00002-of-00003.safetensors",
125
+ "encoder.block.20.layer.0.SelfAttention.o.weight": "model-00002-of-00003.safetensors",
126
+ "encoder.block.20.layer.0.SelfAttention.q.weight": "model-00002-of-00003.safetensors",
127
+ "encoder.block.20.layer.0.SelfAttention.v.weight": "model-00002-of-00003.safetensors",
128
+ "encoder.block.20.layer.0.layer_norm.weight": "model-00002-of-00003.safetensors",
129
+ "encoder.block.20.layer.1.DenseReluDense.wi_0.weight": "model-00002-of-00003.safetensors",
130
+ "encoder.block.20.layer.1.DenseReluDense.wi_1.weight": "model-00002-of-00003.safetensors",
131
+ "encoder.block.20.layer.1.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
132
+ "encoder.block.20.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
133
+ "encoder.block.21.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
134
+ "encoder.block.21.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
135
+ "encoder.block.21.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
136
+ "encoder.block.21.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
137
+ "encoder.block.21.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
138
+ "encoder.block.21.layer.1.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
139
+ "encoder.block.21.layer.1.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
140
+ "encoder.block.21.layer.1.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
141
+ "encoder.block.21.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
142
+ "encoder.block.22.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
143
+ "encoder.block.22.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
144
+ "encoder.block.22.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
145
+ "encoder.block.22.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
146
+ "encoder.block.22.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
147
+ "encoder.block.22.layer.1.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
148
+ "encoder.block.22.layer.1.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
149
+ "encoder.block.22.layer.1.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
150
+ "encoder.block.22.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
151
+ "encoder.block.23.layer.0.SelfAttention.k.weight": "model-00003-of-00003.safetensors",
152
+ "encoder.block.23.layer.0.SelfAttention.o.weight": "model-00003-of-00003.safetensors",
153
+ "encoder.block.23.layer.0.SelfAttention.q.weight": "model-00003-of-00003.safetensors",
154
+ "encoder.block.23.layer.0.SelfAttention.v.weight": "model-00003-of-00003.safetensors",
155
+ "encoder.block.23.layer.0.layer_norm.weight": "model-00003-of-00003.safetensors",
156
+ "encoder.block.23.layer.1.DenseReluDense.wi_0.weight": "model-00003-of-00003.safetensors",
157
+ "encoder.block.23.layer.1.DenseReluDense.wi_1.weight": "model-00003-of-00003.safetensors",
158
+ "encoder.block.23.layer.1.DenseReluDense.wo.weight": "model-00003-of-00003.safetensors",
159
+ "encoder.block.23.layer.1.layer_norm.weight": "model-00003-of-00003.safetensors",
160
+ "encoder.block.3.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
161
+ "encoder.block.3.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
162
+ "encoder.block.3.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
163
+ "encoder.block.3.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
164
+ "encoder.block.3.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
165
+ "encoder.block.3.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
166
+ "encoder.block.3.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
167
+ "encoder.block.3.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
168
+ "encoder.block.3.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
169
+ "encoder.block.4.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
170
+ "encoder.block.4.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
171
+ "encoder.block.4.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
172
+ "encoder.block.4.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
173
+ "encoder.block.4.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
174
+ "encoder.block.4.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
175
+ "encoder.block.4.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
176
+ "encoder.block.4.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
177
+ "encoder.block.4.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
178
+ "encoder.block.5.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
179
+ "encoder.block.5.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
180
+ "encoder.block.5.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
181
+ "encoder.block.5.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
182
+ "encoder.block.5.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
183
+ "encoder.block.5.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
184
+ "encoder.block.5.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
185
+ "encoder.block.5.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
186
+ "encoder.block.5.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
187
+ "encoder.block.6.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
188
+ "encoder.block.6.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
189
+ "encoder.block.6.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
190
+ "encoder.block.6.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
191
+ "encoder.block.6.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
192
+ "encoder.block.6.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
193
+ "encoder.block.6.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
194
+ "encoder.block.6.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
195
+ "encoder.block.6.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
196
+ "encoder.block.7.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
197
+ "encoder.block.7.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
198
+ "encoder.block.7.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
199
+ "encoder.block.7.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
200
+ "encoder.block.7.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
201
+ "encoder.block.7.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
202
+ "encoder.block.7.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
203
+ "encoder.block.7.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
204
+ "encoder.block.7.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
205
+ "encoder.block.8.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
206
+ "encoder.block.8.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
207
+ "encoder.block.8.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
208
+ "encoder.block.8.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
209
+ "encoder.block.8.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
210
+ "encoder.block.8.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
211
+ "encoder.block.8.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
212
+ "encoder.block.8.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
213
+ "encoder.block.8.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
214
+ "encoder.block.9.layer.0.SelfAttention.k.weight": "model-00001-of-00003.safetensors",
215
+ "encoder.block.9.layer.0.SelfAttention.o.weight": "model-00001-of-00003.safetensors",
216
+ "encoder.block.9.layer.0.SelfAttention.q.weight": "model-00001-of-00003.safetensors",
217
+ "encoder.block.9.layer.0.SelfAttention.v.weight": "model-00001-of-00003.safetensors",
218
+ "encoder.block.9.layer.0.layer_norm.weight": "model-00001-of-00003.safetensors",
219
+ "encoder.block.9.layer.1.DenseReluDense.wi_0.weight": "model-00001-of-00003.safetensors",
220
+ "encoder.block.9.layer.1.DenseReluDense.wi_1.weight": "model-00001-of-00003.safetensors",
221
+ "encoder.block.9.layer.1.DenseReluDense.wo.weight": "model-00001-of-00003.safetensors",
222
+ "encoder.block.9.layer.1.layer_norm.weight": "model-00001-of-00003.safetensors",
223
+ "encoder.final_layer_norm.weight": "model-00003-of-00003.safetensors",
224
+ "shared.weight": "model-00001-of-00003.safetensors"
225
+ }
226
+ }
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "49406": {
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49407": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ }
20
+ },
21
+ "bos_token": "<|startoftext|>",
22
+ "clean_up_tokenization_spaces": true,
23
+ "do_lower_case": true,
24
+ "eos_token": "<|endoftext|>",
25
+ "errors": "replace",
26
+ "model_max_length": 77,
27
+ "pad_token": "<|endoftext|>",
28
+ "tokenizer_class": "CLIPTokenizer",
29
+ "unk_token": "<|endoftext|>"
30
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "!",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer_2/tokenizer_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "!",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49406": {
13
+ "content": "<|startoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "49407": {
21
+ "content": "<|endoftext|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ }
28
+ },
29
+ "bos_token": "<|startoftext|>",
30
+ "clean_up_tokenization_spaces": true,
31
+ "do_lower_case": true,
32
+ "eos_token": "<|endoftext|>",
33
+ "errors": "replace",
34
+ "model_max_length": 77,
35
+ "pad_token": "!",
36
+ "tokenizer_class": "CLIPTokenizer",
37
+ "unk_token": "<|endoftext|>"
38
+ }
tokenizer_2/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_3/special_tokens_map.json ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<extra_id_0>",
4
+ "<extra_id_1>",
5
+ "<extra_id_2>",
6
+ "<extra_id_3>",
7
+ "<extra_id_4>",
8
+ "<extra_id_5>",
9
+ "<extra_id_6>",
10
+ "<extra_id_7>",
11
+ "<extra_id_8>",
12
+ "<extra_id_9>",
13
+ "<extra_id_10>",
14
+ "<extra_id_11>",
15
+ "<extra_id_12>",
16
+ "<extra_id_13>",
17
+ "<extra_id_14>",
18
+ "<extra_id_15>",
19
+ "<extra_id_16>",
20
+ "<extra_id_17>",
21
+ "<extra_id_18>",
22
+ "<extra_id_19>",
23
+ "<extra_id_20>",
24
+ "<extra_id_21>",
25
+ "<extra_id_22>",
26
+ "<extra_id_23>",
27
+ "<extra_id_24>",
28
+ "<extra_id_25>",
29
+ "<extra_id_26>",
30
+ "<extra_id_27>",
31
+ "<extra_id_28>",
32
+ "<extra_id_29>",
33
+ "<extra_id_30>",
34
+ "<extra_id_31>",
35
+ "<extra_id_32>",
36
+ "<extra_id_33>",
37
+ "<extra_id_34>",
38
+ "<extra_id_35>",
39
+ "<extra_id_36>",
40
+ "<extra_id_37>",
41
+ "<extra_id_38>",
42
+ "<extra_id_39>",
43
+ "<extra_id_40>",
44
+ "<extra_id_41>",
45
+ "<extra_id_42>",
46
+ "<extra_id_43>",
47
+ "<extra_id_44>",
48
+ "<extra_id_45>",
49
+ "<extra_id_46>",
50
+ "<extra_id_47>",
51
+ "<extra_id_48>",
52
+ "<extra_id_49>",
53
+ "<extra_id_50>",
54
+ "<extra_id_51>",
55
+ "<extra_id_52>",
56
+ "<extra_id_53>",
57
+ "<extra_id_54>",
58
+ "<extra_id_55>",
59
+ "<extra_id_56>",
60
+ "<extra_id_57>",
61
+ "<extra_id_58>",
62
+ "<extra_id_59>",
63
+ "<extra_id_60>",
64
+ "<extra_id_61>",
65
+ "<extra_id_62>",
66
+ "<extra_id_63>",
67
+ "<extra_id_64>",
68
+ "<extra_id_65>",
69
+ "<extra_id_66>",
70
+ "<extra_id_67>",
71
+ "<extra_id_68>",
72
+ "<extra_id_69>",
73
+ "<extra_id_70>",
74
+ "<extra_id_71>",
75
+ "<extra_id_72>",
76
+ "<extra_id_73>",
77
+ "<extra_id_74>",
78
+ "<extra_id_75>",
79
+ "<extra_id_76>",
80
+ "<extra_id_77>",
81
+ "<extra_id_78>",
82
+ "<extra_id_79>",
83
+ "<extra_id_80>",
84
+ "<extra_id_81>",
85
+ "<extra_id_82>",
86
+ "<extra_id_83>",
87
+ "<extra_id_84>",
88
+ "<extra_id_85>",
89
+ "<extra_id_86>",
90
+ "<extra_id_87>",
91
+ "<extra_id_88>",
92
+ "<extra_id_89>",
93
+ "<extra_id_90>",
94
+ "<extra_id_91>",
95
+ "<extra_id_92>",
96
+ "<extra_id_93>",
97
+ "<extra_id_94>",
98
+ "<extra_id_95>",
99
+ "<extra_id_96>",
100
+ "<extra_id_97>",
101
+ "<extra_id_98>",
102
+ "<extra_id_99>"
103
+ ],
104
+ "eos_token": {
105
+ "content": "</s>",
106
+ "lstrip": false,
107
+ "normalized": false,
108
+ "rstrip": false,
109
+ "single_word": false
110
+ },
111
+ "pad_token": {
112
+ "content": "<pad>",
113
+ "lstrip": false,
114
+ "normalized": false,
115
+ "rstrip": false,
116
+ "single_word": false
117
+ },
118
+ "unk_token": {
119
+ "content": "<unk>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false
124
+ }
125
+ }
tokenizer_3/spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
tokenizer_3/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_3/tokenizer_config.json ADDED
@@ -0,0 +1,940 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<pad>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "</s>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<unk>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "32000": {
29
+ "content": "<extra_id_99>",
30
+ "lstrip": true,
31
+ "normalized": false,
32
+ "rstrip": true,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "32001": {
37
+ "content": "<extra_id_98>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": true,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "32002": {
45
+ "content": "<extra_id_97>",
46
+ "lstrip": true,
47
+ "normalized": false,
48
+ "rstrip": true,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "32003": {
53
+ "content": "<extra_id_96>",
54
+ "lstrip": true,
55
+ "normalized": false,
56
+ "rstrip": true,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "32004": {
61
+ "content": "<extra_id_95>",
62
+ "lstrip": true,
63
+ "normalized": false,
64
+ "rstrip": true,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "32005": {
69
+ "content": "<extra_id_94>",
70
+ "lstrip": true,
71
+ "normalized": false,
72
+ "rstrip": true,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "32006": {
77
+ "content": "<extra_id_93>",
78
+ "lstrip": true,
79
+ "normalized": false,
80
+ "rstrip": true,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "32007": {
85
+ "content": "<extra_id_92>",
86
+ "lstrip": true,
87
+ "normalized": false,
88
+ "rstrip": true,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "32008": {
93
+ "content": "<extra_id_91>",
94
+ "lstrip": true,
95
+ "normalized": false,
96
+ "rstrip": true,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "32009": {
101
+ "content": "<extra_id_90>",
102
+ "lstrip": true,
103
+ "normalized": false,
104
+ "rstrip": true,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "32010": {
109
+ "content": "<extra_id_89>",
110
+ "lstrip": true,
111
+ "normalized": false,
112
+ "rstrip": true,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "32011": {
117
+ "content": "<extra_id_88>",
118
+ "lstrip": true,
119
+ "normalized": false,
120
+ "rstrip": true,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "32012": {
125
+ "content": "<extra_id_87>",
126
+ "lstrip": true,
127
+ "normalized": false,
128
+ "rstrip": true,
129
+ "single_word": false,
130
+ "special": true
131
+ },
132
+ "32013": {
133
+ "content": "<extra_id_86>",
134
+ "lstrip": true,
135
+ "normalized": false,
136
+ "rstrip": true,
137
+ "single_word": false,
138
+ "special": true
139
+ },
140
+ "32014": {
141
+ "content": "<extra_id_85>",
142
+ "lstrip": true,
143
+ "normalized": false,
144
+ "rstrip": true,
145
+ "single_word": false,
146
+ "special": true
147
+ },
148
+ "32015": {
149
+ "content": "<extra_id_84>",
150
+ "lstrip": true,
151
+ "normalized": false,
152
+ "rstrip": true,
153
+ "single_word": false,
154
+ "special": true
155
+ },
156
+ "32016": {
157
+ "content": "<extra_id_83>",
158
+ "lstrip": true,
159
+ "normalized": false,
160
+ "rstrip": true,
161
+ "single_word": false,
162
+ "special": true
163
+ },
164
+ "32017": {
165
+ "content": "<extra_id_82>",
166
+ "lstrip": true,
167
+ "normalized": false,
168
+ "rstrip": true,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "32018": {
173
+ "content": "<extra_id_81>",
174
+ "lstrip": true,
175
+ "normalized": false,
176
+ "rstrip": true,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "32019": {
181
+ "content": "<extra_id_80>",
182
+ "lstrip": true,
183
+ "normalized": false,
184
+ "rstrip": true,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "32020": {
189
+ "content": "<extra_id_79>",
190
+ "lstrip": true,
191
+ "normalized": false,
192
+ "rstrip": true,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "32021": {
197
+ "content": "<extra_id_78>",
198
+ "lstrip": true,
199
+ "normalized": false,
200
+ "rstrip": true,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "32022": {
205
+ "content": "<extra_id_77>",
206
+ "lstrip": true,
207
+ "normalized": false,
208
+ "rstrip": true,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "32023": {
213
+ "content": "<extra_id_76>",
214
+ "lstrip": true,
215
+ "normalized": false,
216
+ "rstrip": true,
217
+ "single_word": false,
218
+ "special": true
219
+ },
220
+ "32024": {
221
+ "content": "<extra_id_75>",
222
+ "lstrip": true,
223
+ "normalized": false,
224
+ "rstrip": true,
225
+ "single_word": false,
226
+ "special": true
227
+ },
228
+ "32025": {
229
+ "content": "<extra_id_74>",
230
+ "lstrip": true,
231
+ "normalized": false,
232
+ "rstrip": true,
233
+ "single_word": false,
234
+ "special": true
235
+ },
236
+ "32026": {
237
+ "content": "<extra_id_73>",
238
+ "lstrip": true,
239
+ "normalized": false,
240
+ "rstrip": true,
241
+ "single_word": false,
242
+ "special": true
243
+ },
244
+ "32027": {
245
+ "content": "<extra_id_72>",
246
+ "lstrip": true,
247
+ "normalized": false,
248
+ "rstrip": true,
249
+ "single_word": false,
250
+ "special": true
251
+ },
252
+ "32028": {
253
+ "content": "<extra_id_71>",
254
+ "lstrip": true,
255
+ "normalized": false,
256
+ "rstrip": true,
257
+ "single_word": false,
258
+ "special": true
259
+ },
260
+ "32029": {
261
+ "content": "<extra_id_70>",
262
+ "lstrip": true,
263
+ "normalized": false,
264
+ "rstrip": true,
265
+ "single_word": false,
266
+ "special": true
267
+ },
268
+ "32030": {
269
+ "content": "<extra_id_69>",
270
+ "lstrip": true,
271
+ "normalized": false,
272
+ "rstrip": true,
273
+ "single_word": false,
274
+ "special": true
275
+ },
276
+ "32031": {
277
+ "content": "<extra_id_68>",
278
+ "lstrip": true,
279
+ "normalized": false,
280
+ "rstrip": true,
281
+ "single_word": false,
282
+ "special": true
283
+ },
284
+ "32032": {
285
+ "content": "<extra_id_67>",
286
+ "lstrip": true,
287
+ "normalized": false,
288
+ "rstrip": true,
289
+ "single_word": false,
290
+ "special": true
291
+ },
292
+ "32033": {
293
+ "content": "<extra_id_66>",
294
+ "lstrip": true,
295
+ "normalized": false,
296
+ "rstrip": true,
297
+ "single_word": false,
298
+ "special": true
299
+ },
300
+ "32034": {
301
+ "content": "<extra_id_65>",
302
+ "lstrip": true,
303
+ "normalized": false,
304
+ "rstrip": true,
305
+ "single_word": false,
306
+ "special": true
307
+ },
308
+ "32035": {
309
+ "content": "<extra_id_64>",
310
+ "lstrip": true,
311
+ "normalized": false,
312
+ "rstrip": true,
313
+ "single_word": false,
314
+ "special": true
315
+ },
316
+ "32036": {
317
+ "content": "<extra_id_63>",
318
+ "lstrip": true,
319
+ "normalized": false,
320
+ "rstrip": true,
321
+ "single_word": false,
322
+ "special": true
323
+ },
324
+ "32037": {
325
+ "content": "<extra_id_62>",
326
+ "lstrip": true,
327
+ "normalized": false,
328
+ "rstrip": true,
329
+ "single_word": false,
330
+ "special": true
331
+ },
332
+ "32038": {
333
+ "content": "<extra_id_61>",
334
+ "lstrip": true,
335
+ "normalized": false,
336
+ "rstrip": true,
337
+ "single_word": false,
338
+ "special": true
339
+ },
340
+ "32039": {
341
+ "content": "<extra_id_60>",
342
+ "lstrip": true,
343
+ "normalized": false,
344
+ "rstrip": true,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "32040": {
349
+ "content": "<extra_id_59>",
350
+ "lstrip": true,
351
+ "normalized": false,
352
+ "rstrip": true,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "32041": {
357
+ "content": "<extra_id_58>",
358
+ "lstrip": true,
359
+ "normalized": false,
360
+ "rstrip": true,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "32042": {
365
+ "content": "<extra_id_57>",
366
+ "lstrip": true,
367
+ "normalized": false,
368
+ "rstrip": true,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "32043": {
373
+ "content": "<extra_id_56>",
374
+ "lstrip": true,
375
+ "normalized": false,
376
+ "rstrip": true,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "32044": {
381
+ "content": "<extra_id_55>",
382
+ "lstrip": true,
383
+ "normalized": false,
384
+ "rstrip": true,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "32045": {
389
+ "content": "<extra_id_54>",
390
+ "lstrip": true,
391
+ "normalized": false,
392
+ "rstrip": true,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "32046": {
397
+ "content": "<extra_id_53>",
398
+ "lstrip": true,
399
+ "normalized": false,
400
+ "rstrip": true,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "32047": {
405
+ "content": "<extra_id_52>",
406
+ "lstrip": true,
407
+ "normalized": false,
408
+ "rstrip": true,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "32048": {
413
+ "content": "<extra_id_51>",
414
+ "lstrip": true,
415
+ "normalized": false,
416
+ "rstrip": true,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "32049": {
421
+ "content": "<extra_id_50>",
422
+ "lstrip": true,
423
+ "normalized": false,
424
+ "rstrip": true,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "32050": {
429
+ "content": "<extra_id_49>",
430
+ "lstrip": true,
431
+ "normalized": false,
432
+ "rstrip": true,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "32051": {
437
+ "content": "<extra_id_48>",
438
+ "lstrip": true,
439
+ "normalized": false,
440
+ "rstrip": true,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "32052": {
445
+ "content": "<extra_id_47>",
446
+ "lstrip": true,
447
+ "normalized": false,
448
+ "rstrip": true,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "32053": {
453
+ "content": "<extra_id_46>",
454
+ "lstrip": true,
455
+ "normalized": false,
456
+ "rstrip": true,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "32054": {
461
+ "content": "<extra_id_45>",
462
+ "lstrip": true,
463
+ "normalized": false,
464
+ "rstrip": true,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "32055": {
469
+ "content": "<extra_id_44>",
470
+ "lstrip": true,
471
+ "normalized": false,
472
+ "rstrip": true,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "32056": {
477
+ "content": "<extra_id_43>",
478
+ "lstrip": true,
479
+ "normalized": false,
480
+ "rstrip": true,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "32057": {
485
+ "content": "<extra_id_42>",
486
+ "lstrip": true,
487
+ "normalized": false,
488
+ "rstrip": true,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "32058": {
493
+ "content": "<extra_id_41>",
494
+ "lstrip": true,
495
+ "normalized": false,
496
+ "rstrip": true,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "32059": {
501
+ "content": "<extra_id_40>",
502
+ "lstrip": true,
503
+ "normalized": false,
504
+ "rstrip": true,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "32060": {
509
+ "content": "<extra_id_39>",
510
+ "lstrip": true,
511
+ "normalized": false,
512
+ "rstrip": true,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "32061": {
517
+ "content": "<extra_id_38>",
518
+ "lstrip": true,
519
+ "normalized": false,
520
+ "rstrip": true,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "32062": {
525
+ "content": "<extra_id_37>",
526
+ "lstrip": true,
527
+ "normalized": false,
528
+ "rstrip": true,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "32063": {
533
+ "content": "<extra_id_36>",
534
+ "lstrip": true,
535
+ "normalized": false,
536
+ "rstrip": true,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "32064": {
541
+ "content": "<extra_id_35>",
542
+ "lstrip": true,
543
+ "normalized": false,
544
+ "rstrip": true,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "32065": {
549
+ "content": "<extra_id_34>",
550
+ "lstrip": true,
551
+ "normalized": false,
552
+ "rstrip": true,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "32066": {
557
+ "content": "<extra_id_33>",
558
+ "lstrip": true,
559
+ "normalized": false,
560
+ "rstrip": true,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "32067": {
565
+ "content": "<extra_id_32>",
566
+ "lstrip": true,
567
+ "normalized": false,
568
+ "rstrip": true,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "32068": {
573
+ "content": "<extra_id_31>",
574
+ "lstrip": true,
575
+ "normalized": false,
576
+ "rstrip": true,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "32069": {
581
+ "content": "<extra_id_30>",
582
+ "lstrip": true,
583
+ "normalized": false,
584
+ "rstrip": true,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "32070": {
589
+ "content": "<extra_id_29>",
590
+ "lstrip": true,
591
+ "normalized": false,
592
+ "rstrip": true,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "32071": {
597
+ "content": "<extra_id_28>",
598
+ "lstrip": true,
599
+ "normalized": false,
600
+ "rstrip": true,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "32072": {
605
+ "content": "<extra_id_27>",
606
+ "lstrip": true,
607
+ "normalized": false,
608
+ "rstrip": true,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "32073": {
613
+ "content": "<extra_id_26>",
614
+ "lstrip": true,
615
+ "normalized": false,
616
+ "rstrip": true,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "32074": {
621
+ "content": "<extra_id_25>",
622
+ "lstrip": true,
623
+ "normalized": false,
624
+ "rstrip": true,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "32075": {
629
+ "content": "<extra_id_24>",
630
+ "lstrip": true,
631
+ "normalized": false,
632
+ "rstrip": true,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "32076": {
637
+ "content": "<extra_id_23>",
638
+ "lstrip": true,
639
+ "normalized": false,
640
+ "rstrip": true,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "32077": {
645
+ "content": "<extra_id_22>",
646
+ "lstrip": true,
647
+ "normalized": false,
648
+ "rstrip": true,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "32078": {
653
+ "content": "<extra_id_21>",
654
+ "lstrip": true,
655
+ "normalized": false,
656
+ "rstrip": true,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "32079": {
661
+ "content": "<extra_id_20>",
662
+ "lstrip": true,
663
+ "normalized": false,
664
+ "rstrip": true,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "32080": {
669
+ "content": "<extra_id_19>",
670
+ "lstrip": true,
671
+ "normalized": false,
672
+ "rstrip": true,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "32081": {
677
+ "content": "<extra_id_18>",
678
+ "lstrip": true,
679
+ "normalized": false,
680
+ "rstrip": true,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "32082": {
685
+ "content": "<extra_id_17>",
686
+ "lstrip": true,
687
+ "normalized": false,
688
+ "rstrip": true,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "32083": {
693
+ "content": "<extra_id_16>",
694
+ "lstrip": true,
695
+ "normalized": false,
696
+ "rstrip": true,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "32084": {
701
+ "content": "<extra_id_15>",
702
+ "lstrip": true,
703
+ "normalized": false,
704
+ "rstrip": true,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "32085": {
709
+ "content": "<extra_id_14>",
710
+ "lstrip": true,
711
+ "normalized": false,
712
+ "rstrip": true,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "32086": {
717
+ "content": "<extra_id_13>",
718
+ "lstrip": true,
719
+ "normalized": false,
720
+ "rstrip": true,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "32087": {
725
+ "content": "<extra_id_12>",
726
+ "lstrip": true,
727
+ "normalized": false,
728
+ "rstrip": true,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "32088": {
733
+ "content": "<extra_id_11>",
734
+ "lstrip": true,
735
+ "normalized": false,
736
+ "rstrip": true,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "32089": {
741
+ "content": "<extra_id_10>",
742
+ "lstrip": true,
743
+ "normalized": false,
744
+ "rstrip": true,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "32090": {
749
+ "content": "<extra_id_9>",
750
+ "lstrip": true,
751
+ "normalized": false,
752
+ "rstrip": true,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "32091": {
757
+ "content": "<extra_id_8>",
758
+ "lstrip": true,
759
+ "normalized": false,
760
+ "rstrip": true,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "32092": {
765
+ "content": "<extra_id_7>",
766
+ "lstrip": true,
767
+ "normalized": false,
768
+ "rstrip": true,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "32093": {
773
+ "content": "<extra_id_6>",
774
+ "lstrip": true,
775
+ "normalized": false,
776
+ "rstrip": true,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "32094": {
781
+ "content": "<extra_id_5>",
782
+ "lstrip": true,
783
+ "normalized": false,
784
+ "rstrip": true,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "32095": {
789
+ "content": "<extra_id_4>",
790
+ "lstrip": true,
791
+ "normalized": false,
792
+ "rstrip": true,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "32096": {
797
+ "content": "<extra_id_3>",
798
+ "lstrip": true,
799
+ "normalized": false,
800
+ "rstrip": true,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "32097": {
805
+ "content": "<extra_id_2>",
806
+ "lstrip": true,
807
+ "normalized": false,
808
+ "rstrip": true,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "32098": {
813
+ "content": "<extra_id_1>",
814
+ "lstrip": true,
815
+ "normalized": false,
816
+ "rstrip": true,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "32099": {
821
+ "content": "<extra_id_0>",
822
+ "lstrip": true,
823
+ "normalized": false,
824
+ "rstrip": true,
825
+ "single_word": false,
826
+ "special": true
827
+ }
828
+ },
829
+ "additional_special_tokens": [
830
+ "<extra_id_0>",
831
+ "<extra_id_1>",
832
+ "<extra_id_2>",
833
+ "<extra_id_3>",
834
+ "<extra_id_4>",
835
+ "<extra_id_5>",
836
+ "<extra_id_6>",
837
+ "<extra_id_7>",
838
+ "<extra_id_8>",
839
+ "<extra_id_9>",
840
+ "<extra_id_10>",
841
+ "<extra_id_11>",
842
+ "<extra_id_12>",
843
+ "<extra_id_13>",
844
+ "<extra_id_14>",
845
+ "<extra_id_15>",
846
+ "<extra_id_16>",
847
+ "<extra_id_17>",
848
+ "<extra_id_18>",
849
+ "<extra_id_19>",
850
+ "<extra_id_20>",
851
+ "<extra_id_21>",
852
+ "<extra_id_22>",
853
+ "<extra_id_23>",
854
+ "<extra_id_24>",
855
+ "<extra_id_25>",
856
+ "<extra_id_26>",
857
+ "<extra_id_27>",
858
+ "<extra_id_28>",
859
+ "<extra_id_29>",
860
+ "<extra_id_30>",
861
+ "<extra_id_31>",
862
+ "<extra_id_32>",
863
+ "<extra_id_33>",
864
+ "<extra_id_34>",
865
+ "<extra_id_35>",
866
+ "<extra_id_36>",
867
+ "<extra_id_37>",
868
+ "<extra_id_38>",
869
+ "<extra_id_39>",
870
+ "<extra_id_40>",
871
+ "<extra_id_41>",
872
+ "<extra_id_42>",
873
+ "<extra_id_43>",
874
+ "<extra_id_44>",
875
+ "<extra_id_45>",
876
+ "<extra_id_46>",
877
+ "<extra_id_47>",
878
+ "<extra_id_48>",
879
+ "<extra_id_49>",
880
+ "<extra_id_50>",
881
+ "<extra_id_51>",
882
+ "<extra_id_52>",
883
+ "<extra_id_53>",
884
+ "<extra_id_54>",
885
+ "<extra_id_55>",
886
+ "<extra_id_56>",
887
+ "<extra_id_57>",
888
+ "<extra_id_58>",
889
+ "<extra_id_59>",
890
+ "<extra_id_60>",
891
+ "<extra_id_61>",
892
+ "<extra_id_62>",
893
+ "<extra_id_63>",
894
+ "<extra_id_64>",
895
+ "<extra_id_65>",
896
+ "<extra_id_66>",
897
+ "<extra_id_67>",
898
+ "<extra_id_68>",
899
+ "<extra_id_69>",
900
+ "<extra_id_70>",
901
+ "<extra_id_71>",
902
+ "<extra_id_72>",
903
+ "<extra_id_73>",
904
+ "<extra_id_74>",
905
+ "<extra_id_75>",
906
+ "<extra_id_76>",
907
+ "<extra_id_77>",
908
+ "<extra_id_78>",
909
+ "<extra_id_79>",
910
+ "<extra_id_80>",
911
+ "<extra_id_81>",
912
+ "<extra_id_82>",
913
+ "<extra_id_83>",
914
+ "<extra_id_84>",
915
+ "<extra_id_85>",
916
+ "<extra_id_86>",
917
+ "<extra_id_87>",
918
+ "<extra_id_88>",
919
+ "<extra_id_89>",
920
+ "<extra_id_90>",
921
+ "<extra_id_91>",
922
+ "<extra_id_92>",
923
+ "<extra_id_93>",
924
+ "<extra_id_94>",
925
+ "<extra_id_95>",
926
+ "<extra_id_96>",
927
+ "<extra_id_97>",
928
+ "<extra_id_98>",
929
+ "<extra_id_99>"
930
+ ],
931
+ "clean_up_tokenization_spaces": true,
932
+ "eos_token": "</s>",
933
+ "extra_ids": 100,
934
+ "legacy": true,
935
+ "model_max_length": 512,
936
+ "pad_token": "<pad>",
937
+ "sp_model_kwargs": {},
938
+ "tokenizer_class": "T5Tokenizer",
939
+ "unk_token": "<unk>"
940
+ }
transformer/config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "SD3Transformer2DModel",
3
+ "_diffusers_version": "0.30.1",
4
+ "_name_or_path": "../UltraEdit/resolution_512_model_epoch_2_sd3_5e5/transformer",
5
+ "attention_head_dim": 64,
6
+ "caption_projection_dim": 1536,
7
+ "in_channels": 48,
8
+ "joint_attention_dim": 4096,
9
+ "num_attention_heads": 24,
10
+ "num_layers": 24,
11
+ "out_channels": 16,
12
+ "patch_size": 2,
13
+ "pooled_projection_dim": 2048,
14
+ "pos_embed_max_size": 192,
15
+ "sample_size": 128
16
+ }
transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:413977f9252276649ec148678dd52348e60cc88e12a651f159c98d2a73a58e90
3
+ size 4170374624
vae/config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.30.1",
4
+ "_name_or_path": "../UltraEdit/resolution_512_model_epoch_2_sd3_5e5/vae",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "force_upcast": true,
19
+ "in_channels": 3,
20
+ "latent_channels": 16,
21
+ "latents_mean": null,
22
+ "latents_std": null,
23
+ "layers_per_block": 2,
24
+ "mid_block_add_attention": true,
25
+ "norm_num_groups": 32,
26
+ "out_channels": 3,
27
+ "sample_size": 1024,
28
+ "scaling_factor": 1.5305,
29
+ "shift_factor": 0.0609,
30
+ "up_block_types": [
31
+ "UpDecoderBlock2D",
32
+ "UpDecoderBlock2D",
33
+ "UpDecoderBlock2D",
34
+ "UpDecoderBlock2D"
35
+ ],
36
+ "use_post_quant_conv": false,
37
+ "use_quant_conv": false
38
+ }
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9b67a279283625caee39d61eacb5324243848477b4eb535355eaaa8423d4e09
3
+ size 167666654