CUDA support

#2
by mike-ravkine - opened

Hi @ddh0 I am having trouble loading this model with CUDA, using both latest llama.cpp and revision 18e43766 as per your example I get:

GGML_ASSERT: ggml-cuda.cu:1278: to_fp32_cuda != nullptr

CPU-only seems to work fine, which makes me wonder which backend you are using for inference (Metal?)

Owner

Yes, I’m using Metal. CUDA support for bf16 is still being worked on in llama.cpp. You could try with batch size <= 16 or on CPU for the time being

@ddh0 Thanks! No luck with decreasing batch size, but -ngl 0 resolved the issue.

mike-ravkine changed discussion status to closed

Sign up or log in to comment