Error running on SageMaker

#13
by Uilo - opened

I'm new to this, just trying to get started using the model on SageMaker, using the new Deploy to SageMaker function/script.

After copying across, and starting an inference endpoint (using the supplied code) I get the following error when trying to run:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM\u0027\u003e, \u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.gptj.modeling_gptj.GPTJForCausalLM\u0027\u003e)."
}

This is the input, as supplied in the deploy script:

predictor.predict({
'inputs': {
    "past_user_inputs": ["Which movie is the best ?"],
    "generated_responses": ["It's Die Hard for sure."],
    "text": "Can you explain why ?"
}
})

Cloudwatch logs:

[INFO ] W-PygmalionAI__pygmalion-6b-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gptj.modeling_gptj.GPTJForCausalLM'>). : 400

Seems I have to supply the model via S3 in a .tar.gz format.

After doing that I get an InternalServiceException: gpt_neox

According to EleutherAI this is related to an incompatible Transformer version. The SageMaker boilerplate code specifies 4.17. Eleuther says to use 4.25... but SageMaker doesn't support anything past 4.17.... oh dear...

So is there a workaround by using a requirements.txt file along with the model somehow? Any hints on how to do that, and what versions do we need to be using?

Pygmalion org

Unfortunately I can't really help you with this - I don't use SageMaker so I have no idea how it works.

I managed to get the 1.3B version going, just trying to get the 6B going now.

Requirements text:

transformers==4.24.0
torch==1.13.1

This is the error I'm getting now when trying to launch 6b:

[WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.

Pygmalion org

Again, I can't really help with this since I've never used SageMaker. Your best bet is probably contacting whatever support channel SageMaker offers to their customers.

11b changed discussion status to closed

@Uilo I have the same issue, did you solve it?

Not really... kind of with the smaller models.

Have a look at this for some more info though:
https://reddit.com/r/PygmalionAI/comments/11dmqly/running_pyg_on_aws_sagemaker/

Sign up or log in to comment