Inference Time

#6
by jitx - opened

Configurations:
4 A100 80G GPUs
use int8 with bitsandbytes

My inference time ranges from second to 100 seconds, does this make sense to you?

It is about 160seconds for 500 tokens

Sign up or log in to comment