pdufour
/

Qwen2-VL-2B-Instruct-ONNX-Q4-F16

Model card Files Files and versions Community

pdufour commited on 15 days ago

Commit

b9ab9cc

•

1 Parent(s): 5fca4b3

Update EXPORT.md

Files changed (1) hide show

EXPORT.md +48 -2

EXPORT.md CHANGED Viewed

@@ -2,7 +2,7 @@
 The original model was exported using the following process:
-THe following repos were used:
 - https://github.com/pdufour/Native-LLM-for-Android
 - https://github.com/pdufour/transformers.js/tree/add-block-list
 -
@@ -11,4 +11,50 @@ If you close this repo and the above 2 to the same directory you can run the fol
 From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`:
 `make all-in-one`
-This will create an export of the onnx models.

 The original model was exported using the following process:
+The following repos were used:
 - https://github.com/pdufour/Native-LLM-for-Android
 - https://github.com/pdufour/transformers.js/tree/add-block-list
 -
 From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`:
 `make all-in-one`
+This will create an export of the onnx models.
+The following commands are all available:
+**General Commands**
+**all-in-one**
+Runs all steps (exporting, slimming, quantizing, cleaning, fixing GPU buffers) to produce fully prepared ONNX models.
+**Export Commands**
+**export**
+Combines export-abcd and export-e to generate ONNX models for all parts.
+**export-abcd**
+Exports model parts A, B, C, and D by running QwenVL_Export_ABCD.py.
+**export-e**
+Exports model part E by running QwenVL_Export_E.py.
+**Slimming Commands**
+***slim**
+Reduces ONNX model size by removing unnecessary elements for optimized deployment.
+**Quantization Commands**
+**quantize**
+Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.
+**quantize-%**
+Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.
+**Cleanup Commands**
+**clean-large-files**
+Deletes ONNX files larger than 2GB from the destination directory to retain appropriately sized models.
+**GPU Buffer Fix Command**
+**fix-gpu-buffers**
+Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.
+**Combined Target**
+all
+Alias for all-in-one to run the full ONNX model preparation pipeline.