Can you help me with this? (Running a pod after I stopped it)

#30
by WolfDavid - opened

folks:

I’m an intern im a complete beginner at dealing with LLM models. My project right now is to learn is fine-tunning of an existing LLM and I'm getting very much stuck just being able to use runpod successfully.

I seem to be unreliable at building and running this LLM model and I’d love your insight.

I’ve tried to create them several times and it ran the first time and the last time and no other times and I don’t know what I am doing wrong.

Successful creation of a runpod that runs the LLM model
When I stop the instance and try to re-start it, I do not get any GPUs so I’m lost about how to use RunPod for LLMs. Here is what happened:

Unsuccessful restart because no GPU
Can you help me both know how to build these instances reliably and how to stop them and restart them with a GPU when I need them?

I am very confused about this page here: Why do I have zero GPUs assigned to my Pod?. It seems to suggest that having a network volume involved will somehow address this 0 GPU problem but I have no idea what I would do with this volume and how it would address the issue.

Once I have this working, what resources should I bring to bear to do my first Fine Tuning of a model? I created a CSV file full of fake product info and would love to see that I can incorporate it. It is only 400 rows.

So, to do that I need to set up a pipeline like described here?

Pipelines

Is there a super simple example I can start with? The “hello world” of Fine Tuning?

Thanks!

WD

Sign up or log in to comment