Using dolly-v2-3b with Build your Chat Bot with Dolly demo

#26
by alaamigdady - opened

I am trying to execute Build your Chat Bot with Dolly demo, using Azure Databricks free trail
with runtime version standard 13.1 ML (includes Apache Spark 3.4.0, Scala 2.12)
and a node type standard_DS3_V2

I am trying to build the qa_chain using databricks/dolly-v2-3b - due to the size of compactional units I have- using these lines of code :

model_name = "databricks/dolly-v2-3b"

ssh.png

I am always getting : ValueError: Could not load model databricks/dolly-v2-3b with any of the following classes: (, , ).

any suggestions for solving this issue?

Databricks org

Not sure. Did it download successfully? did you make any other modifications? You probably don't need to load in 8-bit with the 3b model, note.

I am following this demo https://www.dbdemos.ai/demo-notebooks.html?demoName=llm-dolly-chatbot

all the steps in the data preparation section were executed successfully.
in the prompt engineering section, I was not able to run this command : qa_chain = build_qa_chain() due to the exception mentioned above

I am loading it in 8-bit because it is mentioned as a note in the demo

note.png

am I getting it worng?

Databricks org

Try only changing the model name.
Or check if you're having trouble downloading the model - delete your HF cache and try again

what do you mean by changing the model name? I've already changed it dolly-3b instead of dolly-vb used in the demo? or should I use something else?

Databricks org

Right. I mean, only make that modification, not 8-bit or anything else. But I think you have a download problem.

Databricks org

@alaamigdady try this command to load the model in 8bit, note that the load_in_8bit is fed in via model_kwargs and not as a pipeline parameter. This is already fixed in a future release.

# Note: if you use dolly 12B or smaller model but a GPU with less than 24GB RAM, use 8bit. This requires %pip install bitsandbytes
instruct_pipeline = pipeline(model=model_name, trust_remote_code=True, device_map="auto", model_kwargs={'load_in_8bit': True})

Oh, you're not loading on a GPU. You should. I think you're out of OS memory here. Use a larger instance.

srowen changed discussion status to closed

Sign up or log in to comment