Update EXPORT.md
Browse files
EXPORT.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
|
3 |
The original model was exported using the following process:
|
4 |
|
5 |
-
|
6 |
- https://github.com/pdufour/Native-LLM-for-Android
|
7 |
- https://github.com/pdufour/transformers.js/tree/add-block-list
|
8 |
-
|
@@ -11,4 +11,50 @@ If you close this repo and the above 2 to the same directory you can run the fol
|
|
11 |
From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`:
|
12 |
`make all-in-one`
|
13 |
|
14 |
-
This will create an export of the onnx models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
The original model was exported using the following process:
|
4 |
|
5 |
+
The following repos were used:
|
6 |
- https://github.com/pdufour/Native-LLM-for-Android
|
7 |
- https://github.com/pdufour/transformers.js/tree/add-block-list
|
8 |
-
|
|
|
11 |
From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`:
|
12 |
`make all-in-one`
|
13 |
|
14 |
+
This will create an export of the onnx models.
|
15 |
+
|
16 |
+
The following commands are all available:
|
17 |
+
|
18 |
+
**General Commands**
|
19 |
+
|
20 |
+
**all-in-one**
|
21 |
+
Runs all steps (exporting, slimming, quantizing, cleaning, fixing GPU buffers) to produce fully prepared ONNX models.
|
22 |
+
|
23 |
+
**Export Commands**
|
24 |
+
|
25 |
+
**export**
|
26 |
+
Combines export-abcd and export-e to generate ONNX models for all parts.
|
27 |
+
|
28 |
+
**export-abcd**
|
29 |
+
Exports model parts A, B, C, and D by running QwenVL_Export_ABCD.py.
|
30 |
+
|
31 |
+
**export-e**
|
32 |
+
Exports model part E by running QwenVL_Export_E.py.
|
33 |
+
|
34 |
+
**Slimming Commands**
|
35 |
+
|
36 |
+
***slim**
|
37 |
+
Reduces ONNX model size by removing unnecessary elements for optimized deployment.
|
38 |
+
|
39 |
+
**Quantization Commands**
|
40 |
+
|
41 |
+
**quantize**
|
42 |
+
Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.
|
43 |
+
|
44 |
+
**quantize-%**
|
45 |
+
Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.
|
46 |
+
|
47 |
+
**Cleanup Commands**
|
48 |
+
|
49 |
+
**clean-large-files**
|
50 |
+
Deletes ONNX files larger than 2GB from the destination directory to retain appropriately sized models.
|
51 |
+
|
52 |
+
**GPU Buffer Fix Command**
|
53 |
+
|
54 |
+
**fix-gpu-buffers**
|
55 |
+
Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.
|
56 |
+
|
57 |
+
**Combined Target**
|
58 |
+
|
59 |
+
all
|
60 |
+
Alias for all-in-one to run the full ONNX model preparation pipeline.
|