How to transform our Llama object to GPU

#1
by arshiahemmat - opened

Hey Guys!

Thanks for your amazing implementation!
I wanna transform my model to the GPU but neither ".to(cuda)" nor "device = cuda" are not working (I could run it on CPU)!

from llama_cpp import Llama

llm = Llama(
model_path="Dorna-Llama3-8B-Instruct-GGUF/dorna-llama3-8b-instruct.Q8_0.gguf",
chat_format="llama-3",
n_gpu_layers=-1,
n_ctx=2048,
)

So, could you please give me some tips on properly doing this task?

Thanks for your effort and time!

Hello friends
I have the same problem as above (arshiahemmat comment), my code does not run with GPU, so the response time is high.
Please reply to this comment.

Sign up or log in to comment