I am trying to deploy this model on the A100 p4d sagemaker endpoint instance. What are the configs for TGI that can be used for sub second real time inference? Thanks!
· Sign up or log in to comment