Export

The original model was exported using the following process:

The following repos were used:

If you close this repo and the above 2 to the same directory you can run the following commands:

From Qwen2-VL-2B-Instruct-ONNX-Q4-F16, run:

make all-in-one

This will create an export of the onnx models.

The following is a list of all commands available:

all-in-one

Runs all steps (exporting, slimming, quantizing, cleaning, fixing GPU buffers) to produce fully prepared ONNX models.

export

Combines export-abcd and export-e to generate ONNX models for all parts.

export-abcd

Exports model parts A, B, C, and D by running QwenVL_Export_ABCD.py.

export-e

Exports model part E by running QwenVL_Export_E.py.

slim

Reduces ONNX model size by removing unnecessary elements for optimized deployment.

quantize

Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.

quantize-%

Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.

clean-large-files

Deletes ONNX files larger than 2GB from the destination directory to retain models that will work for onnx environments.

fix-gpu-buffers

Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.

all

Alias for all-in-one to run the full ONNX model preparation pipeline.