The current implementation of tokenizer cannot adopt left-padding

#2
by hiyouga - opened
Qwen org

For batched inference, a left-padded sequence is required, but the tokenizer class does not support left-padding.

According to the source code, the argument padding_side has no effect in the __init__ method.

>>> tok = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True, use_fast=False, padding_side="left")
>>> tok.padding_side
'right'

https://huggingface.co/Qwen/Qwen-7B/blob/main/tokenization_qwen.py#L33

活捉大佬

Qwen org

Thank you for raising this problem. We have updated the code, and this should be fixed not. Please reopen this if the problem still exists.

jklj077 changed discussion status to closed

Sign up or log in to comment