Problem initializing LlamaCpp embeddings

#5
by Carlosky - opened

Hi, I am very new to llms. I'm having problems initializing LlamaCpp embedding using this model. I've used the 7B model and the 13B-chat model with the same quantization type without problems. I'm using the "llama-2-70b.ggmlv3.q4_0.bin" and I'm getting this error:

Traceback (most recent call last):
File "/home/carlosky/llama/llama.py", line 15, in
embeddings = LlamaCppEmbeddings(model_path='/home/carlosky/llama/models/llama-2-70b.ggmlv3.q4_0.bin')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init
pydantic.error_wrappers.ValidationError: 1 validation error for LlamaCppEmbeddings
root
Could not load Llama model from path: /home/carlosky/llama/models/llama-2-70b.ggmlv3.q4_0.bin. Received error (type=value_error)
Exception ignored in: <function Llama.__del__ at 0x7f142d7c4680>
Traceback (most recent call last):
File "/home/carlosky/anaconda3/envs/llama/lib/python3.11/site-packages/llama_cpp/llama.py", line 1510, in del
if self.ctx is not None:
^^^^^^^^
AttributeError: 'Llama' object has no attribute 'ctx'

I'm using Python version 3.11 and llama-cpp-python version 0.1.77 with langchain.

Carlosky changed discussion status to closed

I just read in the README that Python libraries are not supported yet, sorry for the confusion!

Actually that's now slightly out of date - llama-cpp-python updated to version 0.1.77 yesterday which should have Llama 70B support. So that should work now I believe, if you update it. Note that a new parameter is required in llama.cpp - -gqa 8; I don't know how you set that with llama-cpp-python but I assume it does need to set, so check their docs or their code changes.

Though I don't know about LlamaCppEmbeddings, does that use llama-cpp-python or is it a different thing?

I will update my READMEs to mention the llama-cpp-python update later today

Okay, I will check then the new parameter as you mentioned. LlamaCppEmbedding is a Langchain class which integrates llama-cpp-python with the tool.
Thank you for the quick response and for all the work you do!
P.S. Just in case anyone wants to know, Langchain has not yet updated the new llama-cpp-python parameter.

Regarding Langchain and llama-cpp-python - as a temporary fix to try out this model, I just added code in 2 places within my langchain/llms/llamacpp.py.

Insert just after the line starting with "n_gpu_layers: Optional" :
n_gqa: Optional[int] = Field(None, alias="n_gqa")

Then insert just after the comment "# For backwards compatibility, only include if non-null."
if values["n_gqa"] is not None:
model_params["n_gqa"] = values["n_gqa"]

You then add a parameter n_gqa=8 when initialising this 70B model for use in langchain e.g:

llm = LlamaCpp(model_path='...', n_gqa=8, n_gpu_layers=20, n_threads=14, n_ctx=2048, ...)

To try out LlamaCppEmbeddings you would need to apply the edits to a similar file at langchain/embeddings/llamacpp.py

Regarding Langchain and llama-cpp-python - as a temporary fix to try out this model, I just added code in 2 places within my langchain/llms/llamacpp.py.

Insert just after the line starting with "n_gpu_layers: Optional" :
n_gqa: Optional[int] = Field(None, alias="n_gqa")

Then insert just after the comment "# For backwards compatibility, only include if non-null."
if values["n_gqa"] is not None:
model_params["n_gqa"] = values["n_gqa"]

You then add a parameter n_gqa=8 when initialising this 70B model for use in langchain e.g:

llm = LlamaCpp(model_path='...', n_gqa=8, n_gpu_layers=20, n_threads=14, n_ctx=2048, ...)

To try out LlamaCppEmbeddings you would need to apply the edits to a similar file at langchain/embeddings/llamacpp.py

hope langchain get this update soon

Sign up or log in to comment