|
--- |
|
license: apache-2.0 |
|
--- |
|
We proudly announce this is the world first 128k context model based on RWKV architecture today, 2023-08-10. |
|
|
|
This model trained with instructions datasets and chinese web novel and tradition wuxia, |
|
more trainning details would be updated. |
|
|
|
Full finetuned using this repo to train 128k context model , 4*A800 40hours with 4B tokens. |
|
https://github.com/SynthiaDL/TrainChatGalRWKV/blob/main/train_world.sh |
|
![QQ图片20230810144654.jpg](https://cdn-uploads.huggingface.co/production/uploads/6176b32847ee6431f632981e/KjeXNjryiZjKH0PsnrE6J.jpeg) |
|
|
|
|
|
Using RWKV Runner https://github.com/josStorer/RWKV-Runner to test this |
|
|
|
![image.png](https://cdn-uploads.huggingface.co/production/uploads/6176b32847ee6431f632981e/5zDQVbGb-fX8Y8h98tUF0.png) |
|
|
|
![微信截图_20230810142220.png](https://cdn-uploads.huggingface.co/production/uploads/6176b32847ee6431f632981e/u2wA-l1UcW-Mt9KIoa_4q.png) |
|
|
|
![4UYBX4RA0%8PA{1YSSK)AVW.png](https://cdn-uploads.huggingface.co/production/uploads/6176b32847ee6431f632981e/gzr8Yt4JRkBz31-ieRSOE.png) |
|
|
|
![QQ图片20230810143840.png](https://cdn-uploads.huggingface.co/production/uploads/6176b32847ee6431f632981e/LgEjfHJ7XD7PlGM9b3RAf.png) |
|
|
|
|