How to run with multiple GPUs.
#4
by
mocherson
- opened
The example in Quickstart loads the model into cpu for running. I know model.cuda() can move the mode to gpu for small models. How to distribute the model to multiple gpus and to run the generation for large models? Thanks.
This Hugging Face guide for inference with large models using accelerate
may be helpful with distributing the model: https://huggingface.co/docs/accelerate/usage_guides/big_modeling
rskuzma
changed discussion status to
closed