openbmb/MiniCPM-Llama3-V-2_5 · Deploying the model to Amazon Sagemaker for "inference"

Hi Devs, I'm trying to deploy MiniCPM-Llama3-V-2_5 to Amazon Sagemaker to create an inference endpoint. I've tried packaging the model with an inference.py containing the model_fn function to load the model, but getting some errors.

Here's an excerpt from my sagemaker notebook based on notebooks provided by huggingface for loading models from s3:

huggingface_model = HuggingFaceModel(
    model_data="s3://###/minicpm_mod.tar.gz",  # path to your trained sagemaker model
    transformers_version='4.37.0',
    pytorch_version='2.1.0',
    py_version='py310',
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g5.12xlarge' # ec2 instance type
)

I've defined the model_fn in inference.py as

def model_fn(model_dir)

The most recent error is this:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "model_fn() takes 1 positional argument but 2 were given"
}

If I modify the inference.py to take more inputs(either by adding context, or and args+kwargs combo for inputs along with model_dir, and providing 'HF_TASK' as an environment variable, the error changes to model_fn() takes 1 or 2 positional arguments, but 3 were given).

Could you please provide some direction on how I could deploy this model on Sagemaker for inference.