Quick Satrt Fix Please.
Hi everyone, I was just reviewing the README and noticed something in the Quick Start section. There's a code snippet for setting the device_map which is used for distributing the model across multiple GPUs.
Specifically, it defines the device_map dictionary like this:
# set device map
device_map = {
'model.embed_tokens': 'cuda:0',
'model.norm': f'cuda:{world_size - 1}',
'lm_head': f'cuda:{world_size - 1}'
}
And then, a bit later, it defines the world_size variable like this
# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
The problem is that the device_map definition uses world_size in an f-string, but world_size hasn't been defined yet when the device_map is created. So, if someone tries to copy-paste this code directly, it will cause an error because world_size is not in the scope.
It's a minor issue, but it could be a bit frustrating for new users who are trying to get started. Perhaps we could reorder the code in the Quick Start so that world_size is defined before the device_map? This would make it work correctly even with a simple copy and paste.
Do you think updating the Quick Start in the README would be a good idea? I'd appreciate your feedback. Thank you for your time.