Something wrong with the vocabulary encoding

#4
by olga-phillips - opened

I've downloaded the model deepvk/roberta-base and the tokenizer using from pretrained. Unfortunately, I couldn't use them, because there's something wrong with the encoding. Can somebody please help me?

Here's what merges.txt look like (the first 10 rules)

#version: 0.2 - Trained by `huggingface/tokenizers`
Ġ Ð
Ð ¾
Ð µ
Ð °
Ñ Ĥ
Ð ¸
Ñ ģ
о Ð
Ñ Ģ
Ð ½

Sign up or log in to comment