THUDM/LongWriter-glm4-9b · Fix TypeError in _pad method by adding missing padding

Oct 2, 2024

Hi,
Thank you for this model!
I noticed that the _pad method in the ChatGLM4Tokenizer class is missing the padding_side field, which is causing a TypeError when calling the encode method.
This issue comes up when making quants with llama.cpp:

  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
    main()
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
    model_instance.write()
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 434, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 427, in prepare_metadata
    self.set_vocab()
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 3928, in set_vocab
    tokpre = self.get_vocab_base_pre(tokenizer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/convert_hf_to_gguf.py", line 550, in get_vocab_base_pre
    chktok = tokenizer.encode(chktxt)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2791, in encode
    encoded_inputs = self.encode_plus(
                     ^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3210, in encode_plus
    return self._encode_plus(
           ^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils.py", line 801, in _encode_plus
    return self.prepare_for_model(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3706, in prepare_for_model
    encoded_inputs = self.pad(
                     ^^^^^^^^^
  File "/home/glm4/llama.cpp/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 3508, in pad
    encoded_inputs = self._pad(
                     ^^^^^^^^^^
TypeError: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'

Thank you!

Fix TypeError in _pad method by adding missing padding_side field778b5712

zRzRzRzRzRzRzR changed pull request status to merged Oct 2, 2024