Response time
#42
by
Majidni
- opened
Hi, I am using Mistral-7B-Instruct-v0.2 and I am running it on Nvidia 4090. sometimes It responds in a few seconds and sometimes takes up to 2 minutes. Do you have any idea about it?
Use torch_dtype=torch.bfloat16 while loading the model; it will make your output faster.