Meta-Llama-3.1-8B-Instruct deployment on AWS Sagemaker fails
I followed the instructions on Hugging Face to deploy the 'meta-llama/Meta-Llama-3.1-8B-Instruct' model on AWS Sagemaker. Here is the error log:
ValueError: rope_scaling
must be a dictionary with two fields, type
and factor
, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
#033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m#033[2m2024-07-29T11:32:06.411916Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start
Error: ShardCannotStart
The code snippet:
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'meta-llama/Meta-Llama-3.1-8B-Instruct',
'SM_NUM_GPUS': json.dumps(1),
'HUGGING_FACE_HUB_TOKEN': ''
}
assert hub['HUGGING_FACE_HUB_TOKEN'] != '', "You have to provide a token."
create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="2.0.2"),
env=hub,
role=role,
)
deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)
send request
predictor.predict({
"inputs": "Hey my name is Julien! How are you?",
})
The only code change I have done to what has been mentioned in the Hugging Face Deployment guide is this line:
image_uri=get_huggingface_llm_image_uri("huggingface",version="2.0.2") // I have used 2.0.2 instead of 2.2.0 as AWS JupyterLab in the Sagemaker Studio doesn't support HugginFace version 2.2.0 yet.
Can someone share the answer to this issue (if you have already resolved it)? Thank you
Hey did you ever figure this out? Please do let me know
Version 2.2.0 of the llm_image should work now.
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'meta-llama/Meta-Llama-3.1-8B',
'SM_NUM_GPUS': json.dumps(1),
'HUGGING_FACE_HUB_TOKEN': '<REPLACE WITH YOUR TOKEN>'
}
assert hub['HUGGING_FACE_HUB_TOKEN'] != '<REPLACE WITH YOUR TOKEN>', "You have to provide a token."
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="2.2.0"),
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)
# send request
predictor.predict({
"inputs": "My name is Julien and I like to",
})