HF transformers integration
This is a draft PR! Do not merge it until https://github.com/huggingface/transformers/pull/27883 is merged and we test everything
This PR refactors ChatGLM so that both the trust remote code and the transformers code are inter-compatible. Since some keys had to be renamed, therefore we had to change the safetensors weights as well
In order to try this PR you need to first:
pip install -U git+https://github.com/huggingface/transformers@add-chat-glm
Then pass revision="refs/pr/28"
in from_pretrained
. The modeling code will work both with trust_remote_code=True
and trust_remote_code=False
. For example, users that use quantize
should still use trust_remote_code=True
.
import torch
from transformers import AutoTokenizer, ChatGlmForCausalLM
model_id = "THUDM/chatglm3-6b"
model = ChatGlmForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
revision="refs/pr/28",
).to(0)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
inputs = tokenizer("你好", return_tensors="pt").to(0)
output = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(output[0], skip_special_tokens=True))
>>> [gMASK]sop 你好,我是人工智能助手。很高兴认识你叫什么
When I tested it myself, all seemed good so far, if you can cross-test this implementation with me, it would be great.
If you want to convert other chatGLM checkpoints, have a look at this script (that currently support 6b only): https://github.com/huggingface/transformers/blob/add-chat-glm/src/transformers/models/chatglm/convert_chatglm_weights_to_hf.py
cc
@zRzRzRzRzRzRzR
it would be great if you can help testing this branch, especially with respect to quantization and other features I might not be aware of. Regarding .bin
files, I'll submit another PR once https://github.com/huggingface/transformers/pull/27883 and this PR gets merged
Hi, @ybelkada , thank you very much for the work. It seems that the tokenizer is not included in this PR. I would like to add them. Let me know how to work on it.