GGUF quantized and bug fixed version of phi4

review

bug fixed for: "ResponseError: llama runner process has terminated: GGML_ASSERT(hparams.n_swa > 0) failed"
define the architecture (from none) to llama; all works right away
tq1_0 and tq2_0 are not usable; seems at least should start with q2_k

run the model

use any gguf connector to interact with gguf file(s), i.e., connector

reference

base model: microsoft/phi-4
bug fixed following the guide written by unsloth
tool used for quantization: cutter

appendices: model summary and quality (written by microsoft)

model summary


Developers	Microsoft Research
Description	`phi-4` is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning. `phi-4` underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures
Architecture	14B parameters, dense decoder-only Transformer model
Inputs	Text, best suited for prompts in the chat format
Context length	16K tokens
GPUs	1920 H100-80G
Training time	21 days
Training data	9.8T tokens
Outputs	Generated text in response to input
Dates	October 2024 – November 2024
Status	Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data
Release date	December 12, 2024
License	MIT

model quality

to understand the capabilities, we (here refer to microsoft side) compare phi-4 with a set of models over OpenAI’s SimpleEval benchmark; at the high-level overview of the model quality on representative benchmarks; for the table below, higher numbers indicate better performance:

Category	Benchmark	phi-4 (14B)	phi-3 (14B)	Qwen 2.5 (14B instruct)	GPT-4o-mini	Llama-3.3 (70B instruct)	Qwen 2.5 (72B instruct)	GPT-4o
Popular Aggregated Benchmark	MMLU	84.8	77.9	79.9	81.8	86.3	85.3	88.1
Science	GPQA	56.1	31.2	42.9	40.9	49.1	49.0	50.6
Math	MGSM MATH	80.6 80.4	53.5 44.6	79.6 75.6	86.5 73.0	89.1 66.3*	87.3 80.0	90.4 74.6
Code Generation	HumanEval	82.6	67.8	72.1	86.2	78.9*	80.4	90.6
Factual Knowledge	SimpleQA	3.0	7.6	5.4	9.9	20.9	10.2	39.4
Reasoning	DROP	75.5	68.3	85.5	79.3	90.2	76.7	80.9

* these scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following.

calcuis
/

phi4