File size: 1,499 Bytes
5fca4b3 b9ab9cc 3f5d28e 018842c 5fca4b3 1c9150e e8f04a9 5fca4b3 b9ab9cc 5331cf8 b9ab9cc 11e4e4e b9ab9cc 11e4e4e b9ab9cc 11e4e4e b9ab9cc 11e4e4e b9ab9cc b51376f a8fa7ff b9ab9cc a8fa7ff b9ab9cc 1c632d8 b9ab9cc 1c632d8 6dfb44c b9ab9cc 1c632d8 b9ab9cc a8fa7ff 1c632d8 b9ab9cc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# Export
The original model was exported using the following process:
The following repos were used:
* https://github.com/pdufour/Native-LLM-for-Android
* https://github.com/pdufour/transformers.js/tree/add-block-list
If you close this repo and the above 2 to the same directory you can run the following commands:
**From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`, run:**
`make all-in-one`
This will create an export of the onnx models.
**The following is a list of all commands available:**
**all-in-one**
Runs all steps (exporting, slimming, quantizing, cleaning, fixing GPU buffers) to produce fully prepared ONNX models.
**export**
Combines export-abcd and export-e to generate ONNX models for all parts.
**export-abcd**
Exports model parts A, B, C, and D by running QwenVL_Export_ABCD.py.
**export-e**
Exports model part E by running QwenVL_Export_E.py.
**slim**
Reduces ONNX model size by removing unnecessary elements for optimized deployment.
**quantize**
Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.
**quantize-%**
Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.
**clean-large-files**
Deletes ONNX files larger than 2GB from the destination directory to retain models that will work for onnx environments.
**fix-gpu-buffers**
Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.
**all**
Alias for all-in-one to run the full ONNX model preparation pipeline. |