Something wrong with the vocabulary encoding

by olga-phillips - opened Sep 18, 2024

Sep 18, 2024

I've downloaded the model deepvk/roberta-base and the tokenizer using from pretrained. Unfortunately, I couldn't use them, because there's something wrong with the encoding. Can somebody please help me?

Here's what merges.txt look like (the first 10 rules)

#version: 0.2 - Trained by `huggingface/tokenizers`
Ġ Ð
Ð ¾
Ð µ
Ð °
Ñ Ĥ
Ð ¸
Ñ ģ
Ð¾ Ð
Ñ Ģ
Ð ½

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment