CPU or GPU to run PyTorch model on Azure?

#6
by mmustafaicer - opened

This is not per se a code fix question. It is more like a production environment compute instance capacity problem. I have created an endpoint using this model on Azure with STANDARD_DS4_V2 (8 cores, 28 GB RAM, 56 GB disk) to score texts coming in batches. It is production environment of a call center. So you can imagine how many of rows of transcript is flowing (streaming data).

My question is: What type of compute instances you guys use for this model in production environment. It is ~400 MB PyTorch model. For inference, do you guys use CPU or GPU instance? Would it matter in inference as well? I know it is a big difference in training. But is it the same with the inference.

I observe at monitoring tab of Azure endpoints. I can see that endpoint is struggling with incoming data although auto-scaling is enabled. Any experience of running this model in production environment? Which instance types you guys are using compute optimized? memory optimized?

mmustafaicer changed discussion status to closed

Sign up or log in to comment