Error running on SageMaker

#13

by Uilo - opened Feb 28, 2023

Uilo

Feb 28, 2023

I'm new to this, just trying to get started using the model on SageMaker, using the new Deploy to SageMaker function/script.

After copying across, and starting an inference endpoint (using the supplied code) I get the following error when trying to run:

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM\u0027\u003e, \u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.gptj.modeling_gptj.GPTJForCausalLM\u0027\u003e)."
}

This is the input, as supplied in the deploy script:

predictor.predict({
'inputs': {
    "past_user_inputs": ["Which movie is the best ?"],
    "generated_responses": ["It's Die Hard for sure."],
    "text": "Can you explain why ?"
}
})

Cloudwatch logs:

[INFO ] W-PygmalionAI__pygmalion-6b-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: Could not load model /.sagemaker/mms/models/PygmalionAI__pygmalion-6b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gptj.modeling_gptj.GPTJForCausalLM'>). : 400

Uilo

Feb 28, 2023

Seems I have to supply the model via S3 in a .tar.gz format.

After doing that I get an InternalServiceException: gpt_neox

According to EleutherAI this is related to an incompatible Transformer version. The SageMaker boilerplate code specifies 4.17. Eleuther says to use 4.25... but SageMaker doesn't support anything past 4.17.... oh dear...

So is there a workaround by using a requirements.txt file along with the model somehow? Any hints on how to do that, and what versions do we need to be using?

11b

Pygmalion org Mar 1, 2023

Unfortunately I can't really help you with this - I don't use SageMaker so I have no idea how it works.

Uilo

Mar 2, 2023

I managed to get the 1.3B version going, just trying to get the 6B going now.

Requirements text:

transformers==4.24.0
torch==1.13.1

This is the error I'm getting now when trying to launch 6b:

[WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.

11b

Pygmalion org Mar 17, 2023

Again, I can't really help with this since I've never used SageMaker. Your best bet is probably contacting whatever support channel SageMaker offers to their customers.

11b changed discussion status to closed Mar 17, 2023

Fire-Hound

Apr 29, 2023

@Uilo I have the same issue, did you solve it?

Uilo

May 1, 2023

Not really... kind of with the smaller models.

Have a look at this for some more info though:
https://reddit.com/r/PygmalionAI/comments/11dmqly/running_pyg_on_aws_sagemaker/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment