Encounter an exception while trying to run it using manifest
Hi, I cloned this model into my local machine and tried to use it using the following manifest command:
python3 -m manifest.api.app
--model_type huggingface
--model_generation_type llama-text-generation
--model_name_or_path nsql-llama-2-7B
--device 0
but I'm getting this exception:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/lib/python3.11/site-packages/manifest/api/app.py", line 301, in
main()
File "/lib/python3.11/site-packages/manifest/api/app.py", line 148, in main
model = MODEL_CONSTRUCTORS[model_type](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/manifest/api/models/huggingface.py", line 474, in __init__
tokenizer = LlamaTokenizer.from_pretrained(self.model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1988, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "/lib/python3.11/site-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.11/site-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: /Users/runner/work/sentencepiece/sentencepiece/src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
Would appreciate your help here
Hi @dudub ,
Can you try this cmd?
python3 -m manifest.api.app \
--model_type huggingface \
--model_generation_type text-generation \
--model_name_or_path NumbersStation/nsql-llama-2-7B \
--device 0
The current version manifest uses an old class to load llama based model and we will update it in the future manifest release. Thanks!
@senwu
Thanks for the quick answer!
Yes I ran it and looks like a server is up and running but now I'm getting this error while I'm trying to communicate with the LLM using LangChain:
- Running on all addresses (0.0.0.0)
- Running on http://127.0.0.1:5002
- Running on http://192.168.1.108:5002
Press CTRL+C to quit
127.0.0.1 - - [02/Aug/2023 23:37:28] "POST /params HTTP/1.1" 200 -
127.0.0.1 - - [02/Aug/2023 23:37:28] "POST /params HTTP/1.1" 200 -
The followingmodel_kwargs
are not used by the model: ['token_type_ids'] (note: typos in the generate arguments will also show up in this list)
127.0.0.1 - - [02/Aug/2023 23:37:28] "POST /completions HTTP/1.1" 400 -
That's how I configured it in the LangChain side:
manifest = Manifest(
client_name="huggingface",
client_connection="http://127.0.0.1:5002"
)
local_llm = ManifestWrapper(
client=manifest,
llm_kwargs={"temperature": 0.0, "max_tokens": 1024},
verbose=True
)
and used the llm in a SQL agent.
@senwu
I think it's not an issue with LangChain but Manifest cause I'm getting the same error we a simple POST request using Postman:
curl --location 'http://127.0.0.1:5002/completions'
--header 'Content-Type: application/json'
--data '{
"prompt": "Hello World",
"max_tokens": 1024,
"temperature": 0.0,
"repetition_penalty": 1,
"top_k": 50,
"top_p": 10,
"do_sample": "True",
"n": 1,
"max_new_tokens": 1024
}'
I think it's something related to the Manifest server/transformers.
https://huggingface.co/OpenAssistant/falcon-40b-sft-mix-1226/discussions/2
are you familiar with another way to run this model locally besides Manifest?
Hi @dudub ,
Which transformers version are you using? We are using transformer 4.31.0
and it works.
FLASK_PORT=7000 python3 -m manifest.api.app --model_type huggingface --model_generation_type text-generation --model_name_or_path NumbersStation/nsql-llama-2-7B --device 0
Model Name: NumbersStation/nsql-llama-2-7B Model Path: NumbersStation/nsql-llama-2-7B
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3/3 [00:20<00:00, 6.87s/it]
Loaded Model DType torch.float32
Usings max_length: 4096
* Serving Flask app 'app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:7000
* Running on http://38.99.106.21:7000
Press CTRL+C to quit
100.110.106.20 - - [03/Aug/2023 10:10:29] "POST /completions HTTP/1.1" 200 -
curl --location 'http://127.0.0.1:7000/completions' --header 'Content-Type: application/json' --data '{
"prompt": "Hello World",
"max_tokens": 1024,
"temperature": 0.0,
"repetition_penalty": 1,
"top_k": 50,
"top_p": 10,
"do_sample": "True",
"n": 1,
"max_new_tokens": 1024
}'
{"id": "0b13e512-4d03-4d97-8f90-e97d5479ead2", "object": "text_completion", "created": 1691082629, "model": "flask_model", "choices": [{"text": "\n", "logprob": -2.203536033630371, "tokens": [13, 2], "token_logprobs": [-1.6638275384902954, -0.5397084951400757]}]}
@senwu
Thanks again for your help, It indeed was that issue and now the error is gone but now Im facing a new one...
can you tell me on what machine are you running it?
I'm trying to run it on my local Macbook Pro M1 Pro but getting the following error:
"addmm_impl_cpu_" not implemented for 'Half'
maybe I can even do it and need to deploy it on a remote machine? (for now, it's just for testing and playground of course)
That's my log when the server starts up:
[2023-08-03 23:19:50,322] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Model Name: nsql-llama-2-7B Model Path: nsql-llama-2-7B
Loading checkpoint shards: 100%|ββββββββββ| 3/3 [00:29<00:00, 9.86s/it]
Loaded Model DType torch.float16
Usings max_length: 4096
- Serving Flask app 'app'
- Debug mode: off
We've tested the model on Ubuntu 20.04.
Any luck dudub on using a local database query language? I get the best response with OpenAI models, but trying to replicate with private LLM setup.