|
--- |
|
license: openrail++ |
|
--- |
|
|
|
# Terminus XL Otaku (v1 preview) |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
Terminus XL Otaku is a latent diffusion model that uses zero-terminal SNR noise schedule and velocity prediction objective at training and inference time. |
|
|
|
Terminus is a new state-of-the-art model family based on SDXL's architecture, and is compatible with (most) SDXL pipelines. |
|
|
|
For Terminus Otaku (this model), the training data is exclusively anime/celshading/3D renders and other hand-drawn or synthetic art styles. |
|
|
|
The objective of this model was to continue the use of v-prediction objective and min-SNR gamma loss to adapt Terminus Gamma v2's outputs to a more artistic style. |
|
|
|
|
|
- **Fine-tuned from:** ptx0/terminus-xl-gamma-v2 |
|
- **Developed by:** pseudoterminal X (@bghira) |
|
- **Funded by:** pseudoterminal X (@bghira) |
|
- **Model type:** Latent Diffusion |
|
- **License:** openrail++ |
|
- **Architecture:** SDXL |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://github.com/bghira/SimpleTuner |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
Terminus XL Otaku can be used for generating high-quality images given text prompts. |
|
|
|
It should particularly excel at inpainting tasks for animated subject matter, where a zero-terminal SNR noise schedule allows it to more effectively retain contrast. |
|
|
|
The model can be utilized in creative industries such as art, advertising, and entertainment to create visually appealing content. |
|
|
|
### Downstream Use |
|
|
|
Terminus XL Otaku can be fine-tuned for specific tasks such as image super-resolution, style transfer, and more. |
|
|
|
However, it's recommended that the v1 preview not be used for fine-tuning until it is fully released, as any structural issues will hopefully be resolved by then. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model is not designed for tasks outside of image generation. It should not be used to produce harmful content, or deceive others. Please use common sense. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The model might exhibit biases present in the training data. The generated images should be carefully reviewed to ensure they meet ethical and societal standards. |
|
|
|
### Recommendations |
|
|
|
Users should be cautious of potential biases in the generated images and thoroughly review them before use. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
This model's success largely depended on a somewhat small collection of very high quality data samples. |
|
|
|
* Indiscriminate use of NijiJourney outputs. |
|
* Midjourney 5.2 outputs that mention anime styles in their tags. |
|
* Niji and MJ Showcase images that were re-captioned using CogVLM. |
|
* Anchor data of real human subjects in a small (10%) ratio to the animated material, to retain coherence. |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
|
|
This model is (so far) trained exclusively on cropped images using SDXL's crop coordinates to improve fine details. |
|
|
|
No images were upsampled or downsampled during this training session. Instead, random crops (or unaltered 1024px square images) were used in lieu. |
|
|
|
~50,000 images were used for this training run with continuous collection throughout the process, making it difficult to ascertain how many exact images were used. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** bf16 mixed precision |
|
- **Learning rate:** \(1 \times 10^{-7}\) to \(1 \times 10^{-6}\), cosine schedule |
|
- **Epochs:** 11 |
|
- **Batch size:** 12 * 8 = 96 |
|
|
|
#### Speeds, Sizes, Times |
|
|
|
[More Information Needed] |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
[More Information Needed] |
|
|
|
### Results |
|
|
|
[More Information Needed] |
|
|
|
#### Summary |
|
|
|
[More Information Needed] |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** [More Information Needed] |
|
- **Hours used:** [More Information Needed] |
|
- **Cloud Provider:** [More Information Needed] |
|
- **Compute Region:** [More Information Needed] |
|
- **Carbon Emitted:** [More Information Needed] |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
The model uses an SDXL-compatible latent diffusion architecture with a unique min-SNR augmented velocity objective. |
|
|
|
### Compute Infrastructure |
|
|
|
[More Information Needed] |
|
|
|
#### Hardware |
|
|
|
[More Information Needed] |
|
|
|
#### Software |
|
|
|
[More Information Needed] |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
[More Information Needed] |
|
|
|
**APA:** |
|
|
|
[More Information Needed] |
|
|
|
## Glossary |
|
|
|
[More Information Needed] |
|
|
|
## More Information |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Authors |
|
|
|
[More Information Needed] |
|
|
|
## Model Card Contact |
|
|
|
[More Information Needed] |