BlinkDL
/

rwkv-4-pile-169m

Text Generation

Model card Files Files and versions Community

RWKV-4 169M

Use RWKV-4 models (NOT RWKV-4a, NOT RWKV-4b) unless you know what you are doing.

Use RWKV-4 models (NOT RWKV-4a, NOT RWKV-4b) unless you know what you are doing.

Use RWKV-4 models (NOT RWKV-4a, NOT RWKV-4b) unless you know what you are doing.

Model Description

RWKV-4 169M is a L12-D768 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

Use https://github.com/BlinkDL/ChatRWKV to run it.

ctx_len = 1024 n_layer = 12 n_embd = 768

Final checkpoint: RWKV-4-Pile-169M-20220807-8023.pth : Trained on the Pile for 332B tokens.

Pile loss 2.5355
LAMBADA ppl 29.33, acc 32.99%
PIQA acc 65.07%
SC2016 acc 58.79%
Hellaswag acc_norm 32.26%

With tiny attention (--tiny_att_dim 256 --tiny_att_layer 9): RWKV-4a-Pile-170M-20221209-7955.pth

Pile loss 2.4702
LAMBADA ppl 21.42, acc 38.23%
PIQA acc 63.76%
SC2016 acc 59.06%
Hellaswag acc_norm 32.40%

RWKV-4b-Pile-171M-20230202-7922.pth (--my_testing 'a')

Pile loss 2.4222
LAMBADA ppl 22.02, acc 38.56%
PIQA acc 64.04%
SC2016 acc 59.91%
Hellaswag acc_norm 33.33%

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for BlinkDL/rwkv-4-pile-169m

Quantizations

1 model

Dataset used to train BlinkDL/rwkv-4-pile-169m

Spaces using BlinkDL/rwkv-4-pile-169m 2