Which padding side to choose while finetuning

#47
by parikshit1619 - opened

Looks like out of the box Mistral doesn't have a pad_token_id? πŸ€”

I don't understand why, surely they must have used padding during training?

also curious about this..
image.png

any updates on this @Hannibal046 @parikshit1619 ? mistral was originally trained with left side padding and after doing a bit of research most forums are supporting left side as well so that LLM has no mixup of data and pad tokens. Can anybody confirm this

Maybe we need two tokenizers?
My understanding is that "padding_side=left" is needed when we generate the output, because of Mistral is "decoder-only", see this : https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side.

But when using lora for training, "padding_side=right" is needed to avoid overflow issues, see this : https://discuss.huggingface.co/t/qlora-with-gptq/58009

Please correct me if I'm wrong!

Sign up or log in to comment