File size: 4,404 Bytes
2a4040c
 
 
 
 
 
 
fd667b6
2a4040c
 
84a6102
 
 
 
 
 
0ad02d5
 
 
 
 
 
 
 
3b40049
 
 
 
7c3a51f
3b40049
fd667b6
3b40049
 
 
 
 
 
 
 
 
 
89d505e
3b40049
 
 
 
 
 
 
 
 
 
 
fd667b6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
title: README
emoji: 📉
colorFrom: gray
colorTo: gray
sdk: static
pinned: false
license: mit
---

## Update 2023.9.12

If you want to make a LoRA request, see [this article](https://civitai.com/articles/2186/2023-9-12-open-requests-for-character-lora).

CyberHarem is a non-profit technical team that works purely out of interest, so **we do not charge any fees in any form**. However, our computing resources and team members' working time are limited, so **we cannot guarantee the delivery time of models in principle**. We will do our best to complete them as soon as possible under the circumstances, and we hope for your understanding in this regard.

## Update 2023.9.2

Two recent developments:

1. The **automated training process for `v1.4` has been deployed**, and the model's quality has improved significantly compared to before (for more technical details, see: https://civitai.com/articles/2064/2023-8-31-release-of-v14-training-automation-process). We are now in the process of thoroughly cleaning the dataset and retraining the model.

2. We now support LoRA training for characters in anime videos, and the entire process is highly automated.

## What is this?

As you can see, this place is called `CyberHarem`, a centralized repository for anime waifu images dataset and LoRA models.

It's an interesting experiment where **all the datasets, models, model previews, and models published to [civitai](https://civitai.com/user/narugo1992/models) are fully auto-generated without any human intervention**. For this purpose, we've done a lot of tech and data preparation, which you can find in our [Organization - DeepGHS](https://huggingface.co/deepghs) and the code on [Github - DeepGHS](https://github.com/deepghs).

Currently, we have collected databases of several popular mobile games' characters (see [Supported Games of GChar Library](https://narugo1992.github.io/gchar/main/best_practice/supported/index.html#supported-games)) and crawled datasets of female characters from these games for training. In the future, we may include more characters, not just limited to mobile games, but also from anime series. **You can find your waifu with [CyberHarem/find_my_waifu](https://huggingface.co/spaces/CyberHarem/find_my_waifu).**

## Where does the dataset come from? What's the format?

* The dataset is automatically crawled from various major image websites like [ZeroChan](https://zerochan.net), [Anime-Pictures](https://anime-pictures.net/), [Danbooru](https://danbooru.donmai.us/), [Rule34](https://rule34.xxx/), etc. (see [Supported Sites of GChar Library](https://narugo1992.github.io/gchar/main/best_practice/supported/index.html#supported-sites))
* In each dataset repository, there are both original data packs and images resized and aligned to a uniform size, along with image tags generated using the [SmilingWolf/wd-v1-4-convnextv2-tagger-v2](https://huggingface.co/SmilingWolf/wd-v1-4-convnextv2-tagger-v2) model.

## How are the models trained? What's the format?

LoRA models are trained in batch with corresponding datasets. We use [7eu7d7](https://github.com/7eu7d7)'s [HCP-Diffusion](https://github.com/7eu7d7/HCP-Diffusion) training framework for the process.

## How to use a1111's WebUI to generate images of anime waifus?

1. Go to the model repository.
2. Check the Model Card and choose a step that looks good visually.
3. Click on the right side's Download to download the model package. The package contains two files: a `.pt` file and a `.safetensors` format LoRA file.
4. **You need to use both of these models simultaneously. Put the `pt` file in the `embedding` path and use the `safetensors` file as LoRA mount.**
5. Use the trigger words (provided in the Model Card) and prompt text to generate images.

## Why do some preview images not look very much like the original characters?

The prompt texts used in the preview images are **automatically generated** using clustering algorithms based on the feature information extracted from the training dataset. The seed for generating images is also randomly generated, and **the images are not selected or modified** in any way, so there is a probability of such issues.

In reality, according to our internal tests, most models that have this issue perform better in actual use than what you see in the preview images. **The only thing you might need to do is fine-tune the tags you use a bit.**