Meta-Llama-3.1-8B-Instruct deployment on AWS Sagemaker fails

#61

by Keertiraj - opened Jul 29, 2024

Jul 29, 2024

I followed the instructions on Hugging Face to deploy the 'meta-llama/Meta-Llama-3.1-8B-Instruct' model on AWS Sagemaker. Here is the error log:

ValueError: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
#033[2m#033[3mrank#033[0m#033[2m=#033[0m0#033[0m#033[2m2024-07-29T11:32:06.411916Z#033[0m #033[31mERROR#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Shard 0 failed to start

Error: ShardCannotStart

The code snippet:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

models

hub = {
'HF_MODEL_ID':'meta-llama/Meta-Llama-3.1-8B-Instruct',
'SM_NUM_GPUS': json.dumps(1),
'HUGGING_FACE_HUB_TOKEN': ''
}

assert hub['HUGGING_FACE_HUB_TOKEN'] != '', "You have to provide a token."

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
image_uri=get_huggingface_llm_image_uri("huggingface",version="2.0.2"),
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g5.2xlarge",
container_startup_health_check_timeout=300,
)

send request

predictor.predict({
"inputs": "Hey my name is Julien! How are you?",
})

The only code change I have done to what has been mentioned in the Hugging Face Deployment guide is this line:

image_uri=get_huggingface_llm_image_uri("huggingface",version="2.0.2") // I have used 2.0.2 instead of 2.2.0 as AWS JupyterLab in the Sagemaker Studio doesn't support HugginFace version 2.2.0 yet.

Can someone share the answer to this issue (if you have already resolved it)? Thank you

adaryan

Sep 9, 2024

Hey did you ever figure this out? Please do let me know

nbroad

Sep 9, 2024

Version 2.2.0 of the llm_image should work now.

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'meta-llama/Meta-Llama-3.1-8B',
    'SM_NUM_GPUS': json.dumps(1),
    'HUGGING_FACE_HUB_TOKEN': '<REPLACE WITH YOUR TOKEN>'
}

assert hub['HUGGING_FACE_HUB_TOKEN'] != '<REPLACE WITH YOUR TOKEN>', "You have to provide a token."

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface",version="2.2.0"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=300,
  )
  
# send request
predictor.predict({
    "inputs": "My name is Julien and I like to",
})

Keertiraj

Sep 12, 2024

@adaryan yes, the above code should work now. If you still face any issues, share the code snippet here. I will look into it

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment