GGUF quantized and bug fixed version of phi4
review
- bug fixed for: "ResponseError: llama runner process has terminated: GGML_ASSERT(hparams.n_swa > 0) failed"
- define the architecture (from none) to llama; all works right away
- tq1_0 and tq2_0 are not usable; seems at least should start with q2_k
run the model
use any gguf connector to interact with gguf file(s), i.e., connector
reference
- base model: microsoft/phi-4
- bug fixed following the guide written by unsloth
- tool used for quantization: cutter
appendices: model summary and quality (written by microsoft)
model summary
Developers | Microsoft Research |
Description | phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures |
Architecture | 14B parameters, dense decoder-only Transformer model |
Inputs | Text, best suited for prompts in the chat format |
Context length | 16K tokens |
GPUs | 1920 H100-80G |
Training time | 21 days |
Training data | 9.8T tokens |
Outputs | Generated text in response to input |
Dates | October 2024 – November 2024 |
Status | Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data |
Release date | December 12, 2024 |
License | MIT |
model quality
to understand the capabilities, we (here refer to microsoft side) compare phi-4
with a set of models over OpenAI’s SimpleEval benchmark; at the high-level overview of the model quality on representative benchmarks; for the table below, higher numbers indicate better performance:
Category | Benchmark | phi-4 (14B) | phi-3 (14B) | Qwen 2.5 (14B instruct) | GPT-4o-mini | Llama-3.3 (70B instruct) | Qwen 2.5 (72B instruct) | GPT-4o |
---|---|---|---|---|---|---|---|---|
Popular Aggregated Benchmark | MMLU | 84.8 | 77.9 | 79.9 | 81.8 | 86.3 | 85.3 | 88.1 |
Science | GPQA | 56.1 | 31.2 | 42.9 | 40.9 | 49.1 | 49.0 | 50.6 |
Math | MGSM MATH |
80.6 80.4 |
53.5 44.6 |
79.6 75.6 |
86.5 73.0 |
89.1 66.3* |
87.3 80.0 |
90.4 74.6 |
Code Generation | HumanEval | 82.6 | 67.8 | 72.1 | 86.2 | 78.9* | 80.4 | 90.6 |
Factual Knowledge | SimpleQA | 3.0 | 7.6 | 5.4 | 9.9 | 20.9 | 10.2 | 39.4 |
Reasoning | DROP | 75.5 | 68.3 | 85.5 | 79.3 | 90.2 | 76.7 | 80.9 |
* these scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following.
- Downloads last month
- 525
Model tree for calcuis/phi4
Base model
microsoft/phi-4-gguf