README.md · xiaol/rwkv-7B-world-novel-128k at a58ca8b240bd57c166fd28dbba6e02677e9f8979

metadata

license: apache-2.0
datasets:
  - Norquinal/claude_multiround_chat_30k
  - OpenLeecher/Teatime

We proudly announce this is the world first 128k context model based on RWKV architecture today, 2023-08-10.

With RWKV world tokenizer,multi-langs have 1:1 tokenization ratio ,one word to one token. (https://github.com/BlinkDL/ChatRWKV/blob/2a13ddecd81f8fd615b6da3a8f1091a594689e30/tokenizer/rwkv_tokenizer.py#L163)

This model trained with instructions datasets and chinese web novel and tradition wuxia, more trainning details would be updated.

Tested to summary 85k tokens to 5 keypoints ,can find conversation files in example folders ,more cases are coming.

Full finetuned using this repo to train 128k context model , 4*A800 40hours with 1.3B tokens. https://github.com/SynthiaDL/TrainChatGalRWKV/blob/main/train_world.sh

Using RWKV Runner https://github.com/josStorer/RWKV-Runner to test this ， use temp 0.1-0.2 topp 0.7 for more precise answer ,temp between 1-2.x is more creatively.

85k input test