tokenizer.decode output '??'

by SamSha1971 - opened May 29, 2023

May 29, 2023

for i in range(hparams["vocab_size"]):
if i == 46134:
text = tokenizer.decode([i])
print(str(i) + ": " + text)
print(text.encode('utf-8'))

output:
46134: ��
b'\xef\xbf\xbd\xef\xbf\xbd'

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment