Unable to load tokenizer
I got this error:
Traceback (most recent call last):
File "<redacted path>\ChatRWKV-main\v2\chat.py", line 117, in <module>
pipeline = PIPELINE(model, f"<redacted path>/ChatRWKV-main/tokenizer/rwkv_vocab_v20230424.txt")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Program Files\Python311\Lib\site-packages\rwkv\utils.py", line 29, in __init__
self.tokenizer = Tokenizer.from_file(WORD_NAME)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: invalid type: integer `1`, expected struct Tokenizer at line 1 column 1
It seems that the tokenizer is not compatible.
Windows 10, Python 3.11, PyTorch 2.0.0, RWKV 0.7.3, Tokenizers 0.13.3, CUDA 11.8
update RWKV pip package to 0.7.4
and pipeline = PIPELINE(model, "rwkv_vocab_v20230424")
(EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
Bob: Hi
Alice:Traceback (most recent call last):
File "\ChatRWKV-main\v2\chat.py", line 457, in
on_message(msg)
File "\ChatRWKV-main\v2\chat.py", line 359, in on_message
token = pipeline.sample_logits(
^^^^^^^^^^^^^^^^^^^^^^^
File "\ChatRWKV-main\v2/../rwkv_pip_package/src\rwkv\utils.py", line 82, in sample_logits
out = torch.multinomial(probs, num_samples=1)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either inf
, nan
or element < 0
Changing device from 'cuda' to 'cpu' solves it. Might be a bug?
okay i forget to mention you need fp32 too, because here k will overflow in fp16 (fixable in future)