|
# Export |
|
|
|
The original model was exported using the following process: |
|
|
|
The following repos were used: |
|
* https://github.com/pdufour/Native-LLM-for-Android |
|
* https://github.com/pdufour/transformers.js/tree/add-block-list |
|
|
|
If you close this repo and the above 2 to the same directory you can run the following commands: |
|
|
|
**From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`, run:** |
|
|
|
`make all-in-one` |
|
|
|
This will create an export of the onnx models. |
|
|
|
**The following is a list of all commands available:** |
|
|
|
**all-in-one** |
|
|
|
Runs all steps (exporting, slimming, quantizing, cleaning, fixing GPU buffers) to produce fully prepared ONNX models. |
|
|
|
**export** |
|
|
|
Combines export-abcd and export-e to generate ONNX models for all parts. |
|
|
|
**export-abcd** |
|
|
|
Exports model parts A, B, C, and D by running QwenVL_Export_ABCD.py. |
|
|
|
**export-e** |
|
|
|
Exports model part E by running QwenVL_Export_E.py. |
|
|
|
**slim** |
|
|
|
Reduces ONNX model size by removing unnecessary elements for optimized deployment. |
|
|
|
**quantize** |
|
|
|
Quantizes all model parts (A, B, C, D, and E) to optimize size and performance. |
|
|
|
**quantize-%** |
|
|
|
Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations. |
|
|
|
**clean-large-files** |
|
|
|
Deletes ONNX files larger than 2GB from the destination directory to retain models that will work for onnx environments. |
|
|
|
**fix-gpu-buffers** |
|
|
|
Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility. |
|
|
|
**all** |
|
|
|
Alias for all-in-one to run the full ONNX model preparation pipeline. |