Triangle104/Arcee-Maestro-7B-Preview-Q5_K_M-GGUF

This model was converted to GGUF format from arcee-ai/Arcee-Maestro-7B-Preview using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

Arcee-Maestro-7B-Preview (7B) is Arcee's first reasoning model trained with reinforment learning. It is based on the Qwen2.5-7B DeepSeek-R1 distillation DeepSeek-R1-Distill-Qwen-7B with further GPRO training. Though this is just a preview of our upcoming work, it already shows promising improvements to mathematical and coding abilities across a range of tasks.

Intended Use Cases

Advanced reasoning

Mathematics

Coding

Training & Fine-Tuning

Initial Training: Began with DeepSeek-R1-Distill-Qwen-7B GRPO: Trained on 450,000 verified math problems Additional bootstrapped coding examples

Performance

Arcee-Maestro-7B-Preview shows strong performance in mathematics as well as coding, competing against even O1 preview, a model far surprassing its size.

Limitations

Context Length: 128k Tokens (may vary depending on the final tokenizer settings and system resources). Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.

Ethical Considerations

Content Generation Risks: Like any language model, Arcee-Maestro-7B-Preview can generate potentially harmful or biased content if prompted in certain ways.

License

Arcee-Maestro-7B-Preview (7B) is released under the Apache-2.0 License. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license.

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/Arcee-Maestro-7B-Preview-Q5_K_M-GGUF --hf-file arcee-maestro-7b-preview-q5_k_m.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/Arcee-Maestro-7B-Preview-Q5_K_M-GGUF --hf-file arcee-maestro-7b-preview-q5_k_m.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/Arcee-Maestro-7B-Preview-Q5_K_M-GGUF --hf-file arcee-maestro-7b-preview-q5_k_m.gguf -p "The meaning to life and the universe is"

./llama-server --hf-repo Triangle104/Arcee-Maestro-7B-Preview-Q5_K_M-GGUF --hf-file arcee-maestro-7b-preview-q5_k_m.gguf -c 2048

Triangle104
/

Arcee-Maestro-7B-Preview-Q5_K_M-GGUF