The current implementation of tokenizer cannot adopt left-padding

by hiyouga - opened Aug 4, 2023

Qwen org Aug 4, 2023

For batched inference, a left-padded sequence is required, but the tokenizer class does not support left-padding.

According to the source code, the argument padding_side has no effect in the __init__ method.

>>> tok = AutoTokenizer.from_pretrained("Qwen/Qwen-7B", trust_remote_code=True, use_fast=False, padding_side="left")
>>> tok.padding_side
'right'

louisY

Aug 4, 2023

活捉大佬

Qwen org Aug 8, 2023

Thank you for raising this problem. We have updated the code, and this should be fixed not. Please reopen this if the problem still exists.

jklj077 changed discussion status to closed Aug 8, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment