Which padding side to choose while finetuning

#47

by parikshit1619 - opened Oct 10, 2023

Oct 10, 2023

Should I choose padding side left or right in finetuning?

reference:
https://wandb.ai/byyoung3/ml-news/reports/Fine-Tuning-Mistral-7B-on-Python-Code-With-A-Single-GPU---Vmlldzo1NTg0NzY5 (suggest right)
https://github.com/brevdev/notebooks/blob/main/mistral-finetune.ipynb (suggest left)

ingo-m

Nov 3, 2023

Looks like out of the box Mistral doesn't have a pad_token_id? 🤔

I don't understand why, surely they must have used padding during training?

Hannibal046

Mar 1

also curious about this..

haiderasad

Mar 21

any updates on this @Hannibal046 @parikshit1619 ? mistral was originally trained with left side padding and after doing a bit of research most forums are supporting left side as well so that LLM has no mixup of data and pad tokens. Can anybody confirm this

guluguluguluhihi

May 28

Maybe we need two tokenizers?
My understanding is that "padding_side=left" is needed when we generate the output, because of Mistral is "decoder-only", see this : https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side.

But when using lora for training, "padding_side=right" is needed to avoid overflow issues, see this : https://discuss.huggingface.co/t/qlora-with-gptq/58009

Please correct me if I'm wrong!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment