What would be the average inference time for this model using beam width =4

#31

by ashwin26 - opened Apr 9, 2024

Apr 9, 2024

I am using this model on a database schema(average 20 tables, 30 columns for each table). Currently running on a 1x4090 GPU with 128GB ram. It is taking a long time (7-10 mins). This is how I am loading the model and inferencing
Any suggestions on how I can improve speed?

jp-defog

Defog.ai org Apr 10, 2024

@ashwin26 thank you for testing out our model. You may try https://blog.vllm.ai/2023/06/20/vllm.html or https://huggingface.co/docs/text-generation-inference/en/index for optimized serving

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment