Sagemaker deployment config for sub second real time inference

by vibranium - opened Dec 11, 2023

Dec 11, 2023

I am trying to deploy this model on the A100 p4d sagemaker endpoint instance. What are the configs for TGI that can be used for sub second real time inference? Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment