Momo-XL / README.md
PotatoBox's picture
Update README.md
bc25d31 verified
|
raw
history blame
4.58 kB
---
license: mit
---
<h1 style="font-size: 2em; text-align: center; font-weight: bold; color: #FF69B4; text-shadow: 1px 1px 2px rgba(0, 0, 0, 0.2); font-family: 'Arial', sans-serif;">
Momo XL - Anime-Style SDXL Base Model
</h1>
<style>
.gallery {
display: flex;
flex-wrap: wrap;
justify-content: center;
}
.gallery img {
width: 30%;
margin: 1%;
}
.gallery img.wide {
width: 45%;
}
</style>
<div class="gallery">
<img src="./card_images/01.png" alt="Sample Image 1">
<img src="./card_images/02.png" alt="Sample Image 2">
<img src="./card_images/03.png" alt="Sample Image 3">
<img src="./card_images/04.png" alt="Sample Image 4">
<img src="./card_images/05.png" alt="Sample Image 5">
<img src="./card_images/06.png" alt="Sample Image 6">
<img src="./card_images/07.png" alt="Sample Image 7">
<img src="./card_images/08.png" alt="Sample Image 8">
<img src="./card_images/09.png" alt="Sample Image 9">
<img src="./card_images/10.png" class="wide" alt="Sample Image 10">
<img src="./card_images/11.png" class="wide" alt="Sample Image 11">
</div>
**Momo XL** is an anime-style model based on SDXL, fine-tuned to produce high-quality anime-style images with detailed and vibrant aesthetics. (Oct 6, 2024)
## Key Features:
- **Anime-Focused SDXL**: Tailored for generating high-quality anime-style images, making it ideal for artists and enthusiasts.
- **Optimized for Tag-Based Prompting**: Works best when prompted with descriptive tags, ensuring accurate and relevant outputs.
- **LoRA Compatible**: Compatible with most LoRA models available on the hub, allowing for versatile customization and style transfer.
## Usage Instructions:
- **Tagging**: Use descriptive tags separated by commas to guide the image generation. Tags can be arranged in any order to suit your creative needs.
- **Year-Specific Styles**: To emulate art styles from a specific year, use the tag format "**`year 20XX`**" (e.g., "**`year 2023`**").
- **LoRA Models**: Momo XL supports most LoRA models, enabling enhanced and tailored outputs for your projects.
## Disclaimer:
This model may produce unexpected or unintended results. **Use with caution and at your own risk.**
**Important Notice:**
- **Ethical Use**: Please ensure that your use of this model is ethical and complies with all applicable laws and regulations.
- **Content Responsibility**: Users are responsible for the content they generate. Do not use the model to create or disseminate illegal, harmful, or offensive material.
- **Data Sources**: The model was trained on publicly available datasets. While efforts have been made to filter and curate the training data, some undesirable content may remain.
Thank you! 😊
------------------------------------------------------
## Momo XL - Training Details (Oct 15, 2024)
### Dataset
Momo XL was trained using a dataset of over **400,000+ images** sourced from Danbooru.
### Base Model
Momo XL was built on top of SDXL, incorporating knowledge from two finetuned models:
- Formula:
`SDXL_base + (Animagine 3.0 base - SDXL_base) * 1.0 + (Pony V6 - SDXL_base) * 0.5`
For more details:
- [Animagine 3.0 base](https://huggingface.co/Linaqruf/animagine-xl-3.0)
- [Pony V6](https://huggingface.co/LyliaEngine/Pony_Diffusion_V6_XL)
### Training Process
Training was conducted on **A100 80GB GPUs**, totaling over **2000+ GPU hours**. The training was divided into three stages:
- **Finetuning - First Stage**: Trained on the entire dataset with a defined set of training configurations.
- **Finetuning - Second Stage**: Also trained on the entire dataset with some variations in settings.
- **Adjustment Stage**: Focused on aesthetic adjustments to improve the overall visual quality.
The final model, **Momo XL**, was released by merging the Text Encoder from the Finetuning Second Stage with the UNet from the Adjustment Stage.
### Hyperparameters
| Stage | Epochs | UNet lr | Text Encoder lr | Batch Size | Resolution | Noise Offset | Optimizer | LR Scheduler |
|--------------------------|--------|---------|-----------------|------------|------------|--------------|------------|--------------|
| **Finetuning 1st Stage** | 10 | 2e-5 | 1e-5 | 256 | 1024² | N/A | AdamW8bit | Constant |
| **Finetuning 2nd Stage** | 10 | 2e-5 | 1e-5 | 256 | Max. 1280² | N/A | AdamW | Constant |
| **Adjustment Stage** | 0.25 | 8e-5 | 4e-5 | 1024 | Max. 1280² | 0.05 | AdamW | Constant |