Using `self.transformer.wte.weight` directly for LM head breaks HF accelerate device map auto infer on multi-gpu

#46

by shijie-wu - opened May 31, 2023

May 31, 2023

•

Hi,

Using self.transformer.wte.weight directly for LM head
https://huggingface.co/mosaicml/mpt-7b/blob/053e1a33a6e7043aefaa3f5d13c48269a5511cff/modeling_mpt.py#L239
breaks HF accelerate device map auto infer on multi-gpu, since the final layer will be placed on different GPU then self.transformer.wte. this could be fixed by making a dummy LM head and tie the params, similar to all GPT-style models in transformers

May 31, 2023

May 31, 2023

Working on it, we are testing with FSDP first to make sure nothing breaks: https://github.com/mosaicml/llm-foundry/pull/225

Jun 3, 2023

Please give it a try!

abhi-mosaic changed discussion status to closed Jun 3, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment