isek-ai
/

LightNovel-Intro-RetNet-400M

Text Generation

Generated from Trainer

Model card Files Files and versions Community

LightNovel-Intro-RetNet-400M / README.md

p1atdev's picture

Update README.md

f814b50 12 months ago

|

4.03 kB

	---
	tags:
	- generated_from_trainer
	- retnet
	model-index:
	- name: kakuyomu-retnet-300m-1
	results: []
	license: mit
	language:
	- ja
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# LightNovel-Intro-RetNet-400M

	This model is a RetNet model trained from scratch using https://github.com/syncdoth/RetNet.

	Demo: https://huggingface.co/spaces/isek-ai/LightNovel-Intro-RetNet-400M-Demo

	## Usage

	First install the required libraries:

	```
	pip install transformers safetensors timm
	```

	Then clone the repository of [implementation of RetNet written by syncdoth](https://github.com/syncdoth/RetNet) in the same directory as the inference script:

	```
	git clone https://github.com/syncdoth/RetNet.git
	```

	Example inference script:

	```py
	from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

	MODEL_NAME = "isek-ai/LightNovel-Intro-RetNet-400M"

	device = "cuda"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
	model = AutoModelForCausalLM.from_pretrained(
	MODEL_NAME,
	trust_remote_code=True,
	).to(device)
	gen_config = GenerationConfig.from_pretrained(MODEL_NAME)
	gen_config.max_new_tokens = 32

	inputs = tokenizer("目が覚めると、", return_tensors="pt", add_special_tokens=False).to(device)

	print("Generating...")

	result = model.generate(**inputs, generation_config=gen_config)

	print(tokenizer.decode(result[0], skip_special_tokens=True))
	# 目が覚めると、見知らぬ空間に居た。「ん......?」思わずそんな声が出たことに違和感を感じる。確か、気付けば私は
	```

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0006
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 5.5155 \| 0.06 \| 1000 \| 5.5331 \|
	\| 5.0106 \| 0.13 \| 2000 \| 5.1774 \|
	\| 4.793 \| 0.19 \| 3000 \| 4.9399 \|
	\| 4.7078 \| 0.26 \| 4000 \| 4.7737 \|
	\| 4.4789 \| 0.32 \| 5000 \| 4.6373 \|
	\| 4.3269 \| 0.38 \| 6000 \| 4.5422 \|
	\| 4.337 \| 0.45 \| 7000 \| 4.4632 \|
	\| 4.374 \| 0.51 \| 8000 \| 4.4070 \|
	\| 4.1447 \| 0.58 \| 9000 \| 4.3293 \|
	\| 4.1402 \| 0.64 \| 10000 \| 4.2881 \|
	\| 4.1329 \| 0.7 \| 11000 \| 4.2287 \|
	\| 3.9985 \| 0.77 \| 12000 \| 4.1858 \|
	\| 4.1185 \| 0.83 \| 13000 \| 4.1506 \|
	\| 4.0515 \| 0.9 \| 14000 \| 4.0993 \|
	\| 3.9984 \| 0.96 \| 15000 \| 4.0611 \|
	\| 3.7731 \| 1.02 \| 16000 \| 4.0423 \|
	\| 3.7403 \| 1.09 \| 17000 \| 3.8166 \|
	\| 3.6778 \| 1.15 \| 18000 \| 3.8000 \|
	\| 3.7227 \| 1.22 \| 19000 \| 3.7875 \|
	\| 3.6051 \| 1.28 \| 20000 \| 3.7664 \|
	\| 3.6143 \| 1.34 \| 21000 \| 3.7496 \|
	\| 3.6323 \| 1.41 \| 22000 \| 3.7278 \|
	\| 3.6487 \| 1.47 \| 23000 \| 3.7089 \|
	\| 3.6524 \| 1.54 \| 24000 \| 3.6951 \|
	\| 3.5621 \| 1.6 \| 25000 \| 3.6801 \|
	\| 3.5722 \| 1.66 \| 26000 \| 3.6708 \|
	\| 3.5277 \| 1.73 \| 27000 \| 3.6635 \|
	\| 3.6224 \| 1.79 \| 28000 \| 3.6565 \|
	\| 3.5663 \| 1.85 \| 29000 \| 3.6532 \|
	\| 3.5937 \| 1.92 \| 30000 \| 3.6515 \|
	\| 3.5944 \| 1.98 \| 31000 \| 3.6510 \|


	### Framework versions

	- Transformers 4.34.0
	- Pytorch 2.0.0+cu118
	- Datasets 2.14.5
	- Tokenizers 0.14.0