Phi-3 Mini-4K-Instruct ONNX model for onnxruntime-web
This is the same models as the official phi3 onnx model with a few changes to make it work for onnxruntime-web:
- the model is fp16 with int4 block quantization for weights
- the 'logits' output is fp32
- the model uses MHA instead of GQA
- onnx and external data file need to stay below 2GB to be cacheable in chromium
Usage (Transformers.js)
If you haven't already, you can install the Transformers.js JavaScript library from NPM using:
npm i @huggingface/transformers
You can then use the model to generate text like this:
import { pipeline, TextStreamer } from "@huggingface/transformers";
// Create a text generation pipeline
const generator = await pipeline(
"text-generation",
"Xenova/Phi-3-mini-4k-instruct_fp16",
);
// Define the list of messages
const messages = [
{ role: "user", content: "Solve the equation: x^2 - 3x + 2 = 0" },
];
// Create text streamer
const streamer = new TextStreamer(generator.tokenizer, {
skip_prompt: true,
// callback_function: (text) => { }, // Optional callback function
})
// Generate a response
const output = await generator(messages, { max_new_tokens: 512, do_sample: false, streamer });
console.log(output[0].generated_text.at(-1).content);
- Downloads last month
- 518
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-generation models for transformers.js library.