AttributeError: 'BaichuanTokenizer' object has no attribute 'sp_model'
AttributeError: 'BaichuanTokenizer' object has no attribute 'sp_model',
how to add support to latest transofmers? 4.34?
transfomers 4.34 doesn't work for me either. Degrading to 4.33.1 works in my case
Since vllm==0.2.1 requires transformers==4.34.1 support for mistral, I don't think downgrading is a good idea, could contributors fix this bug or tell me anything I could do for a temporary fix ?
Since vllm==0.2.1 requires transformers==4.34.1 support for mistral, I don't think downgrading is a good idea, could contributors fix this bug or tell me anything I could do for a temporary fix ?
solved with reference: https://github.com/huggingface/transformers/issues/26340#issuecomment-1766794575 , this may fix this bug for now.
Since vllm==0.2.1 requires transformers==4.34.1 support for mistral, I don't think downgrading is a good idea, could contributors fix this bug or tell me anything I could do for a temporary fix ?
solved with reference: https://github.com/huggingface/transformers/issues/26340#issuecomment-1766794575 , this may fix this bug for now.
It didn't work for me
update
tokenization_baichuan.py :
https://github.com/huggingface/transformers/issues/26340
You should file an issue on the model repos and tell them to rearrange the tokenizer init so that self.sp_model is created before calling super().init()
this solved the problem for me, edit tokenization_baichuan.py, in __init__
, find super().__init__
function call and move it to the end of __init__
I solved the problem By
- pip install transformers==4.34.0
- move super().init like this
self.vocab_file = vocab_file
self.add_bos_token = add_bos_token
self.add_eos_token = add_eos_token
self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
self.sp_model.Load(vocab_file)
super().init(
bos_token=bos_token,
eos_token=eos_token,
unk_token=unk_token,
pad_token=pad_token,
add_bos_token=add_bos_token,
add_eos_token=add_eos_token,
sp_model_kwargs=self.sp_model_kwargs,
clean_up_tokenization_spaces=clean_up_tokenization_spaces,
**kwargs,
)