SageMaker generation speed, timeouts
#33
by
elanmarkowitz
- opened
I deployed an endpoint on SageMaker using a g5.48x
instance.
However, it seems much slower than other models and frequently times out.
Has anyone else seen this issue or know any ways to increase generation speed?
Deployed using this image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.1.1-tgi1.3.1-gpu-py310-cu121-ubuntu20.04-v1.0
@elanmarkowitz What does your config look like?