ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0
Hello,
I have an issue when running the model on Tesla P6 GPU with 16GB of RAM:
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla P6 GPU has compute capability 6.1. You can use float16 instead by explicitly setting thedtype
flag in CLI, for example: --dtype=half.
Model is served with vLLM. I've tried the suggestion to use "--dtype=half" when calling the model, but it gave me another error, obviously solution is not that simple.
Any suggestions how can I approach solving this issue?
Regards.
I'm not a professional, but the solution I encountered for the same problem before is:
In huggingface transformer
device = "cuda"
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model.to(device).half()
Regarding vLLM, I think you can inquire on their GitHub.
https://github.com/vllm-project/vllm/issues
Solved, just for the record if other face the same issue: Had to vonvert the model to GGUF, int8 or fp16 and then it worked.