Quantized Octo-planner: On-device Language Model for Planner-Action Agents Framework

This repo includes GGUF quantized models, for our Octo-planner model at NexaAIDev/octopus-planning

GGUF Quantization

To run the models, please download them to your local machine using either git clone or Hugging Face Hub

git clone https://huggingface.co/NexaAIDev/octo-planner-gguf

Run with llama.cpp (Recommended)

Clone and compile:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# Compile the source code:
make

Execute the Model:

Run the following command in the terminal:

./llama-cli -m ./path/to/octopus-planning-Q4_K_M.gguf -p "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"

Run with Ollama

Since our models have not been uploaded to the Ollama server, please download the models and manually import them into Ollama by following these steps:

Install Ollama on your local machine. You can also following the guide from Ollama GitHub repository

git clone https://github.com/ollama/ollama.git ollama

Locate the local Ollama directory:

cd ollama

Create a Modelfile in your directory

touch Modelfile

In the Modelfile, include a FROM statement with the path to your local model, and the default parameters:

FROM ./path/to/octopus-planning-Q4_K_M.gguf

Use the following command to add the model to Ollama:

ollama create octopus-planning-Q4_K_M -f Modelfile

Verify that the model has been successfully imported:

ollama ls

Run the mode

ollama run octopus-planning-Q4_K_M "<|user|>Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants<|end|><|assistant|>"

Quantized GGUF Models Benchmark

Name	Quant method	Bits	Size	Use Cases
octopus-planning-Q2_K.gguf	Q2_K	2	1.42 GB	fast but high loss, not recommended
octopus-planning-Q3_K.gguf	Q3_K	3	1.96 GB	extremely not recommended
octopus-planning-Q3_K_S.gguf	Q3_K_S	3	1.68 GB	extremely not recommended
octopus-planning-Q3_K_M.gguf	Q3_K_M	3	1.96 GB	moderate loss, not very recommended
octopus-planning-Q3_K_L.gguf	Q3_K_L	3	2.09 GB	not very recommended
octopus-planning-Q4_0.gguf	Q4_0	4	2.18 GB	moderate speed, recommended
octopus-planning-Q4_1.gguf	Q4_1	4	2.41 GB	moderate speed, recommended
octopus-planning-Q4_K.gguf	Q4_K	4	2.39 GB	moderate speed, recommended
octopus-planning-Q4_K_S.gguf	Q4_K_S	4	2.19 GB	fast and accurate, very recommended
octopus-planning-Q4_K_M.gguf	Q4_K_M	4	2.39 GB	fast, recommended
octopus-planning-Q5_0.gguf	Q5_0	5	2.64 GB	fast, recommended
octopus-planning-Q5_1.gguf	Q5_1	5	2.87 GB	very big, prefer Q4
octopus-planning-Q5_K.gguf	Q5_K	5	2.82 GB	big, recommended
octopus-planning-Q5_K_S.gguf	Q5_K_S	5	2.64 GB	big, recommended
octopus-planning-Q5_K_M.gguf	Q5_K_M	5	2.82 GB	big, recommended
octopus-planning-Q6_K.gguf	Q6_K	6	3.14 GB	very big, not very recommended
octopus-planning-Q8_0.gguf	Q8_0	8	4.06 GB	very big, not very recommended
octopus-planning-F16.gguf	F16	16	7.64 GB	extremely big

Quantized with llama.cpp