How to transform our Llama object to GPU
#1
by
arshiahemmat
- opened
Hey Guys!
Thanks for your amazing implementation!
I wanna transform my model to the GPU but neither ".to(cuda)" nor "device = cuda" are not working (I could run it on CPU)!
from llama_cpp import Llama
llm = Llama(
model_path="Dorna-Llama3-8B-Instruct-GGUF/dorna-llama3-8b-instruct.Q8_0.gguf",
chat_format="llama-3",
n_gpu_layers=-1,
n_ctx=2048,
)
So, could you please give me some tips on properly doing this task?
Thanks for your effort and time!
Hello friends
I have the same problem as above (arshiahemmat comment), my code does not run with GPU, so the response time is high.
Please reply to this comment.