Problem while making multiple request at a time from seperate chat bot instances

#18
by krishnapiya - opened

GGML_ASSERT: /tmp/pip-install-3q_fwex4/llama-cpp-python_520e3a5b95cc4b339cb4759635dc8a44/vendor/llama.cpp/ggml-cuda.cu:6741: ptr == (void *) (g_cuda_pool_addr[device] + g_cuda_pool_used[device])
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.

The above error is created while I try to process multiple request at a time.The error is happening from a chat bot created using Llama-2-7b chat GGUF file locally

This probably isn't an issue with the GGUF file itself, as it just stores the model. If you are using Llama.cpp or another framework like that, it's probably an issue with that.

Sign up or log in to comment