Something wrong with the vocabulary encoding
#4
by
olga-phillips
- opened
I've downloaded the model deepvk/roberta-base
and the tokenizer using from pretrained
. Unfortunately, I couldn't use them, because there's something wrong with the encoding. Can somebody please help me?
Here's what merges.txt
look like (the first 10 rules)
#version: 0.2 - Trained by `huggingface/tokenizers`
Ġ Ð
Ð ¾
Ð µ
Ð °
Ñ Ĥ
Ð ¸
Ñ ģ
о Ð
Ñ Ģ
Ð ½