Can we run this model on CPU?

#3
by gustavomr - opened

I think a lot of users might have this questions in order to have it locally and running on laptop and simple computers.

Can we have an alternative version (simplest), maybe trained using same corpus but with less params in order to run on CPUS?

By the way, congrats on this amazing job! Keep rocking!

Databricks org

You can, but it would be very very slow. You really want a GPU.
The training code for "v2" will be on the repo soon, and you could use that to train from a smaller Pythia model.
Maybe the team will just do that. But models small enough to work on CPUs are <100M params and that may not perform that well for the kind of text-gen QA task people expect to use this for.

Can we quantize the model, like someone did in Llama.cpp? (Pardon my ignorance)

Databricks org

Sure, you can try. See https://github.com/databrickslabs/dolly for source code (2.0 training code coming soon)

People have already done it here: https://github.com/ggerganov/llama.cpp/discussions/569. Looks like it runs on CPU just fine :)

Direct link to cpu ready version is here - https://huggingface.co/geemili/dolly-v2-12b/tree/main

How do you run https://huggingface.co/geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?

How do you run https://huggingface.co/geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?

You can try the following in the attachment.
ProcedureDolly.jpg

How do you run https://huggingface.co/geemili/dolly-v2-12b/tree/main (12b-quantized model, .ggml file) with AutoTokenizer and AutoModelForCausalLM?

You can try the following in the attachment.
ProcedureDolly.jpg

Having trouble installing bitsnadbytes on databricks E8as_v4 GPU cluster?

Databricks org

Hm, what issue? bitsandbytes has been working fine for me

Library installation attempted on the driver node of cluster 0413-233703-4jtufovq and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(bash, /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip, install, bitsandbytes==0.38.1, --index-url, https://github.com/timdettmers/bitsandbytes, --disable-pip-version-check) exited with code 1. ERROR: Could not find a version that satisfies the requirement bitsandbytes==0.38.1 (from versions: none) ERROR: No matching distribution found for bitsandbytes==0.38.1

Databricks org

Works fine for me: %pip install bitsandbytes==0.38.1 using the 13.0 ML runtime. How are you installing, exactly?

using %pip install bitsandbytes==0.38.1

image.png

Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/

/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/databricks/jars/*')}
warn(msg)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get CUDA error: invalid device function errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-6fa3d848-9028-4c25-be76-e27f73042d8f/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)

Then when I run print(pipe("Explain to me the difference between nuclear fission and fusion."))

RuntimeError: probability tensor contains either inf, nan or element < 0

Databricks org

Yeah, that's all working - it's just that the model hits overflow on that input in 8-bit. This can happen. IIRC this seemed to happen on the V100, not A10, but may be just coincidence. Try an A10, or a smaller model.

srowen changed discussion status to closed

Sign up or log in to comment