|
--- |
|
license: mit |
|
datasets: |
|
- isek-ai/danbooru-tags-2016-2023 |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
# SDPrompt-RetNet-v2-beta |
|
|
|
This model is a pretrained RetNet model trained from scratch using https://github.com/syncdoth/RetNet. |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.5923 |
|
|
|
## Usage |
|
|
|
```bash |
|
pip install transformers safetensors |
|
``` |
|
|
|
```py |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer |
|
|
|
MODEL_NAME = "isek-ai/SDPrompt-RetNet-v2-beta" |
|
DEVICE = "cuda" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) |
|
model= AutoModelForCausalLM.from_pretrained( |
|
MODEL_NAME, |
|
torch_dtype=torch.float16, # or torch.bfloat16 |
|
trust_remote_code=True, |
|
).to(DEVICE) |
|
model.eval() |
|
streamer = TextStreamer(tokenizer) |
|
|
|
prompt = "1girl" |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
_ = model.generate( |
|
inputs["input_ids"], |
|
max_new_tokens=256, |
|
do_sample=True, |
|
top_p=0.9, |
|
top_k=20, |
|
temperature=0.9, |
|
streamer=streamer, |
|
) |
|
# 1girl, :<, bag, black hair, blurry, bokeh, cloud, depth of field, from side, long sleeves, night, outdoors, pleated skirt, power lines, purple eyes, road, scenery, shoes, shoulder bag,gasm, sidelocks, sign, skirt,let's drawsaurus, skylight smile, sneakers, standing, star (sky), sweater, town, traffic cone, utility pole, vending machine, wide-eyed, window, wooden box, yellow skirt,ization, zettai ryouiki, zoom layer, white footwear, zipper, zipper pull tab, zipperland sheet, zombie pose, ladder, leaning back, leg up, looking to the side,let, miniskirt, motion blur, musical note, open mouth, part |
|
``` |
|
|
|
|
|
## Model description |
|
|
|
This model is trained with **only Danbooru tags** to generate prompts for image generation models. |
|
|
|
## Training data |
|
|
|
- [isek-ai/danbooru-tags-2016-2023](https://huggingface.co/datasets/isek-ai/danbooru-tags-2016-2023) |
|
|
|
### Dataset filtering |
|
|
|
TODO |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0001 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 2 |
|
- total_train_batch_size: 64 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 500 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:----:|:---------------:| |
|
| 0.975 | 0.07 | 500 | 1.0005 | |
|
| 0.7549 | 0.13 | 1000 | 0.7604 | |
|
| 0.6923 | 0.2 | 1500 | 0.7090 | |
|
| 0.6753 | 0.26 | 2000 | 0.6778 | |
|
| 0.6591 | 0.33 | 2500 | 0.6568 | |
|
| 0.6337 | 0.39 | 3000 | 0.6429 | |
|
| 0.6288 | 0.46 | 3500 | 0.6319 | |
|
| 0.624 | 0.53 | 4000 | 0.6218 | |
|
| 0.62 | 0.59 | 4500 | 0.6172 | |
|
| 0.603 | 0.66 | 5000 | 0.6090 | |
|
| 0.5931 | 0.72 | 5500 | 0.6032 | |
|
| 0.5957 | 0.79 | 6000 | 0.5986 | |
|
| 0.5972 | 0.85 | 6500 | 0.5948 | |
|
| 0.5928 | 0.92 | 7000 | 0.5926 | |
|
| 0.5904 | 0.98 | 7500 | 0.5923 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.36.1 |
|
- Pytorch 2.1.2+cu121 |
|
- Datasets 2.15.0 |
|
- Tokenizers 0.15.0 |
|
|