metadata
license: openrail++
tags:
- text-to-image
- stable-diffusion
- diffusers
AnimeBoysXL v1.0
It takes substantial time and efforts to bake models. If you appreciate my models, I would be grateful if you could support me on Ko-fi ☕.
Features
- ✔️ Good for inference: AnimeBoysXL is a flexible model which is good at generating images of anime boys and males-only content in a wide range of styles.
- ✔️ Good for training: AnimeBoysXL is suitable for further training, thanks to its neutral style and ability to recognize a great deal of concepts. Feel free to train your own anime boy model/LoRA from AnimeBoysXL.
- ❌ AnimeBoysXL is not optimized for creating anime girls. Please consider using other models for that purpose.
Inference Guide
- Prompt: Use tag-based prompts to describe your subject.
- Append
, best quality, amazing quality, best aesthetic, absurdres
to the prompt to improve image quality. - (Optional) Append
, year YYYY
to the prompt to shift the output toward the prevalent style of that year.YYYY
is a 4 digit year, e.g., year 2023
- Append
- Negative prompt: Choose from one of the following two presets.
- Heavy (recommended):
lowres, (bad:1.05), text, error, missing, extra, fewer, cropped, jpeg artifacts, worst quality, bad quality, watermark, bad aesthetic, unfinished, chromatic aberration, scan, scan artifacts, 1girl, breasts
- Light:
lowres, jpeg artifacts, worst quality, watermark, blurry, bad aesthetic, 1girl, breasts
- (Optional) Add
, realistic, lips, nose
to the negative prompt if you need a flat anime-like style face.
- Heavy (recommended):
- VAE: Make sure you're using SDXL VAE.
- Sampling method, sampling steps and CFG scale: I find (Euler a, 28, 5) good. You are encouraged to experiment with other settings.
- Width and height: 832*1216 for portrait, 1024*1024 for square, and 1216*832 for landscape.
Training Details
AnimeBoysXL is trained from Stable Diffusion XL Base 1.0, on ~516k images.
The following tags are attached to the training data to make it easier to steer toward either more aesthetic or more flexible results.
Quality tags
tag | score |
---|---|
best quality |
>= 150 |
amazing quality |
[100, 150) |
great quality |
[75, 100) |
normal quality |
[0, 75) |
bad quality |
(-5, 0) |
worst quality |
<= -5 |
Aesthetic tags
tag | score |
---|---|
best aesthetic |
>= 6.675 |
great aesthetic |
[6.0, 6.675) |
normal aesthetic |
[5.0, 6.0) |
bad aesthetic |
< 5.0 |
Rating tags
tag | rating |
---|---|
(None) | general |
slightly nsfw |
sensitive |
fairly nsfw |
questionable |
very nsfw |
explicit |
Year tags
year YYYY
where YYYY
is in the range of [2005, 2023].
Training configurations
- Hardware: 4 * Nvidia A100 80GB GPUs
- Optimizer: AdaFactor
- Gradient accumulation steps: 8
- Batch size: 4 * 8 * 4 = 128
- Learning rates:
- 8e-6 for U-Net
- 5.2e-6 for text encoder 1 (CLIP ViT-L)
- 4.8e-6 for text encoder 2 (OpenCLIP ViT-bigG)
- Learning rate schedule: constant with 250 warmup steps
- Mixed precision training type: BF16
- Epochs: 20