lightonai
/

mambaoutai

staghado commited on Apr 10

Commit

0d98ca1

•

1 Parent(s): 3721b72

Add inference llama.cpp example (#3)

- Add inference llama.cpp example (46c9330ad34bf46bbf77d753441a6bbd9d9553aa)

Co-authored-by: Said Taghadouini <staghado@users.noreply.huggingface.co>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -74,6 +74,34 @@ out = model.generate(input_ids, max_new_tokens=10)
 print(tokenizer.batch_decode(out))
 ```
 ### Model hyperparameters
 More details about the model hyperparameters are given in the table below :

 print(tokenizer.batch_decode(out))
 ```
+### On-device Inference
+Since Mambaoutai is only 1.6B parameters, it can run on a CPU at a a fast speed.
+Here is an example of how to run it on llama.cpp:
+```bash
+# Clone llama.cpp repository and compile it from source
+git clone https://github.com/ggerganov/llama.cpp\
+cd llama.cpp
+make
+# Create a venv and install dependencies
+conda create -n mamba-cpp python=3.10
+conda activate mamba-cpp
+pip install -r requirements/requirements-convert-hf-to-gguf.txt
+# Download the weights, tokenizer, config, tokenizer_config and special_tokens_map from this repo and
+# put them in a directory 'Mambaoutai/'
+mkdir Mambaoutai
+# Convert the weights to GGUF format
+python convert-hf-to-gguf.py Mambaoutai
+# Run inference with a prompt
+./main -m Mambaoutai/ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 1
+```
 ### Model hyperparameters
 More details about the model hyperparameters are given in the table below :