Text Generation
Transformers
PyTorch
English
gpt2
feature-extraction
causal-lm
text-generation-inference

How to run with multiple GPUs.

#4
by mocherson - opened

The example in Quickstart loads the model into cpu for running. I know model.cuda() can move the mode to gpu for small models. How to distribute the model to multiple gpus and to run the generation for large models? Thanks.

This Hugging Face guide for inference with large models using accelerate may be helpful with distributing the model: https://huggingface.co/docs/accelerate/usage_guides/big_modeling

rskuzma changed discussion status to closed

Sign up or log in to comment