Deploy with Sagemaker LMI
I've tried to deploy this with Sagemaker LMI but is not possible. Seems the model should follow this layout:
- compiled: neff files
- checkpoint: pytorch weights compiled
- tokenizer...
Is it possible to get something like that? Or least some code snippet how to deploy this as an endpoint? I've tried but still no luck
Hey @josete89 . What you are describing is the layout for the Optimum library. This example was originally built with Transformers because Optimum didn't have the support for Mistral, but I saw a PR went through last week with it. We should be able to update it to work. Reach out to me.
optimum-neuron >= 0.0.17
compatible models have been added for several configurations.
I was able to compile the model seamlessly, but when I tried to deploy it:
from sagemaker.huggingface.model import HuggingFaceModel
# create Hugging Face Model Class
model = HuggingFaceModel(
model_data=s3_model_uri, # path to your model.tar.gz on s3
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.34.1", # transformers version used
pytorch_version="1.13.1", # pytorch version used
py_version='py310', # python version used
model_server_workers=1, # number of workers for the model server
)
I got the following message when I sent a request:
"Pretrained model is compiled with neuronx-cc(2.12.54.0+f631c2365) newer than current compiler (2.11.0.34+c5231f848), which may cause runtime".
I guess the base image needs to be updated.
@josete89 Yes, Mistral requires the new 2.16 SDK. Not all the images are updated yet.
As of today, you would need to use 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.26.0-neuronx-sdk2.16.0
That may require you to repackage your model depending on what image you were using previously. Watch for updates at https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers
What image are you using to deploy now? You may be able to update that and deploy it as a custom image.
Right now I'm using: "763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-inference-neuronx:1.13.1-transformers4.34.1-neuronx-py310-sdk2.15.0-ubuntu20.04" I guess that's the problem :) How can I deploy it as custom image then?
You can specify a custom image by using
image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:1.13.1-optimum0.0.16-neuronx-py310-ubuntu22.04-v1.0",
instead of the version settings--those are just used to find the right image automatically for you.
You can try the above--it is a Hugging Face Text Generation Image update, but I am not sure of the SDK version.
You can also create a sagemaker compatible image and upload it to your private ECR repository.
git clone https://github.com/huggingface/optimum-neuron
cd optimum-neuron
make neuronx-tgi-sagemaker
It is extra steps, but you can specify the exact version of the SDK in the Docker file:
https://github.com/huggingface/optimum-neuron/blob/main/text-generation-inference/Dockerfile
It is less steps if you can wait for the SageMaker team to release an updated image.
The new image with AWS Neuron SDK 2.16 and optimum-neuron 0.0.17 has been released: https://github.com/aws/deep-learning-containers/releases/tag/v1.0-hf-tgi-0.0.17-pt-1.13.1-inf-neuronx-py310
@josete89 Make sure you check out the new blog post from HF that walks you through it. No image updates needed!
https://huggingface.co/blog/text-generation-inference-on-inferentia2