Using dolly-v2-3b with Build your Chat Bot with Dolly demo
I am trying to execute Build your Chat Bot with Dolly demo, using Azure Databricks free trail
with runtime version standard 13.1 ML (includes Apache Spark 3.4.0, Scala 2.12)
and a node type standard_DS3_V2
I am trying to build the qa_chain using databricks/dolly-v2-3b - due to the size of compactional units I have- using these lines of code :
model_name = "databricks/dolly-v2-3b"
I am always getting : ValueError: Could not load model databricks/dolly-v2-3b with any of the following classes: (, , ).
any suggestions for solving this issue?
Not sure. Did it download successfully? did you make any other modifications? You probably don't need to load in 8-bit with the 3b model, note.
I am following this demo https://www.dbdemos.ai/demo-notebooks.html?demoName=llm-dolly-chatbot
all the steps in the data preparation section were executed successfully.
in the prompt engineering section, I was not able to run this command : qa_chain = build_qa_chain() due to the exception mentioned above
I am loading it in 8-bit because it is mentioned as a note in the demo
am I getting it worng?
Try only changing the model name.
Or check if you're having trouble downloading the model - delete your HF cache and try again
what do you mean by changing the model name? I've already changed it dolly-3b instead of dolly-vb used in the demo? or should I use something else?
Right. I mean, only make that modification, not 8-bit or anything else. But I think you have a download problem.
@alaamigdady
try this command to load the model in 8bit, note that the load_in_8bit
is fed in via model_kwargs and not as a pipeline parameter. This is already fixed in a future release.
# Note: if you use dolly 12B or smaller model but a GPU with less than 24GB RAM, use 8bit. This requires %pip install bitsandbytes
instruct_pipeline = pipeline(model=model_name, trust_remote_code=True, device_map="auto", model_kwargs={'load_in_8bit': True})
Oh, you're not loading on a GPU. You should. I think you're out of OS memory here. Use a larger instance.