ONNX Conversion script

#10
by ha1772007 - opened

Can you provide the script by which this model is converted to q4

Snowflake org
edited Oct 7

The ONNX files were contributed without a conversion script by HuggingFace staff member @Xenova here, so you may want to ping @Xenova directly.

I believe he uses quantize.py, I think in particular these lines are in charge of the q4 quantization: https://github.com/xenova/transformers.js/blob/v3/scripts/quantize.py#L188-L208

P.s. are you getting good results with that quantization?

Yes Quantization is increasing good speed especially on CPU

comparison between float32 and float16 -> 99% similarity
comparison between float32 and int8 -> 97% similarity

I calculated Similarity on over 80+ 2000 characters long text pieces by cosine similarity

ha1772007 changed discussion status to closed
ha1772007 changed discussion status to open
spacemanidol changed discussion status to closed

Sign up or log in to comment