How to transform our Llama object to GPU

by arshiahemmat - opened Jul 21

Jul 21

Hey Guys!

Thanks for your amazing implementation!
I wanna transform my model to the GPU but neither ".to(cuda)" nor "device = cuda" are not working (I could run it on CPU)!

from llama_cpp import Llama

llm = Llama(
model_path="Dorna-Llama3-8B-Instruct-GGUF/dorna-llama3-8b-instruct.Q8_0.gguf",
chat_format="llama-3",
n_gpu_layers=-1,
n_ctx=2048,
)

So, could you please give me some tips on properly doing this task?

Thanks for your effort and time!

smasadifar

Aug 7

•

edited Aug 7

Hello friends
I have the same problem as above (arshiahemmat comment), my code does not run with GPU, so the response time is high.
Please reply to this comment.

MiladMola

Part DP AI org 28 days ago

Hi!
Please check this https://github.com/abetlen/llama-cpp-python/issues/576
You can use ollama (https://ollama.com/).

mohalisad changed discussion status to closed 28 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment