pdufour
/

Qwen2-VL-2B-Instruct-ONNX-Q4-F16

Model card Files Files and versions Community

Qwen2-VL-2B-Instruct-ONNX-Q4-F16 / EXPORT.md

pdufour's picture

Update EXPORT.md

11e4e4e verified 10 days ago

|

history blame contribute delete

1.5 kB

	# Export

	The original model was exported using the following process:

	The following repos were used:
	* https://github.com/pdufour/Native-LLM-for-Android
	* https://github.com/pdufour/transformers.js/tree/add-block-list

	If you close this repo and the above 2 to the same directory you can run the following commands:

	From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`, run:

	`make all-in-one`

	This will create an export of the onnx models.

	The following is a list of all commands available:

	all-in-one

	Runs all steps (exporting, slimming, quantizing, cleaning, fixing GPU buffers) to produce fully prepared ONNX models.

	export

	Combines export-abcd and export-e to generate ONNX models for all parts.

	export-abcd

	Exports model parts A, B, C, and D by running QwenVL_Export_ABCD.py.

	export-e

	Exports model part E by running QwenVL_Export_E.py.

	slim

	Reduces ONNX model size by removing unnecessary elements for optimized deployment.

	quantize

	Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.

	quantize-%

	Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.

	clean-large-files

	Deletes ONNX files larger than 2GB from the destination directory to retain models that will work for onnx environments.

	fix-gpu-buffers

	Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.

	all

	Alias for all-in-one to run the full ONNX model preparation pipeline.