GGUF quantized and bug fixed version of phi4

review

  • bug fixed for: "ResponseError: llama runner process has terminated: GGML_ASSERT(hparams.n_swa > 0) failed"
  • define the architecture (from none) to llama; all works right away
  • tq1_0 and tq2_0 are not usable; seems at least should start with q2_k

run the model

use any gguf connector to interact with gguf file(s), i.e., connector

reference

  • base model: microsoft/phi-4
  • bug fixed following the guide written by unsloth
  • tool used for quantization: cutter

appendices: model summary and quality (written by microsoft)

model summary

Developers Microsoft Research
Description phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures
Architecture 14B parameters, dense decoder-only Transformer model
Inputs Text, best suited for prompts in the chat format
Context length 16K tokens
GPUs 1920 H100-80G
Training time 21 days
Training data 9.8T tokens
Outputs Generated text in response to input
Dates October 2024 – November 2024
Status Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data
Release date December 12, 2024
License MIT

model quality

to understand the capabilities, we (here refer to microsoft side) compare phi-4 with a set of models over OpenAI’s SimpleEval benchmark; at the high-level overview of the model quality on representative benchmarks; for the table below, higher numbers indicate better performance:

Category Benchmark phi-4 (14B) phi-3 (14B) Qwen 2.5 (14B instruct) GPT-4o-mini Llama-3.3 (70B instruct) Qwen 2.5 (72B instruct) GPT-4o
Popular Aggregated Benchmark MMLU 84.8 77.9 79.9 81.8 86.3 85.3 88.1
Science GPQA 56.1 31.2 42.9 40.9 49.1 49.0 50.6
Math MGSM
MATH
80.6
80.4
53.5
44.6
79.6
75.6
86.5
73.0
89.1
66.3*
87.3
80.0
90.4
74.6
Code Generation HumanEval 82.6 67.8 72.1 86.2 78.9* 80.4 90.6
Factual Knowledge SimpleQA 3.0 7.6 5.4 9.9 20.9 10.2 39.4
Reasoning DROP 75.5 68.3 85.5 79.3 90.2 76.7 80.9

* these scores are lower than those reported by Meta, perhaps because simple-evals has a strict formatting requirement that Llama models have particular trouble following.

Downloads last month
525
GGUF
Model size
14.7B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for calcuis/phi4

Quantized
(1)
this model