KeyError: '<|endoftext|>'报错怎么解决?

#41
by jackleef - opened

Using unk_token, but it is not set yet.
Traceback (most recent call last):
tokenizer = AutoTokenizer.from_pretrained("models/glm-4-9b-chat",trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 689, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\tokenization_utils_base.py", line 1841, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\tokenization_utils_base.py", line 2077, in _from_pretrained
added_tokens = tokenizer.sanitize_special_tokens()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\tokenization_utils_base.py", line 856, in sanitize_special_tokens
return self.add_tokens(self.all_special_tokens_extended, special_tokens=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\tokenization_utils_base.py", line 999, in add_tokens
return self._add_tokens(new_tokens, special_tokens=special_tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\tokenization_utils.py", line 421, in _add_tokens
and self.convert_tokens_to_ids(token) == self.convert_tokens_to_ids(self.unk_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\tokenization_utils.py", line 575, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX\Lib\site-packages\transformers\tokenization_utils.py", line 588, in _convert_token_to_id_with_added_voc
return self._convert_token_to_id(token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "XXX/.cache\huggingface\modules\transformers_modules\glm-4-9b-chat\tokenization_chatglm.py", line 95, in _convert_token_to_id
return self.mergeable_ranks[token]
~~~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: '<|endoftext|>'

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

升级transformers到4.40

zRzRzRzRzRzRzR changed discussion status to closed

你好,请问transformer升级到4.40了还是会报错怎么办?@zRzRzRzRzRzRzR

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

还是用同样的报错吗,4.40没有这个错误了,你在检查一下环境

Sign up or log in to comment