File size: 1,499 Bytes
5fca4b3
 
 
 
b9ab9cc
3f5d28e
 
018842c
5fca4b3
 
1c9150e
e8f04a9
5fca4b3
 
b9ab9cc
 
5331cf8
b9ab9cc
 
11e4e4e
b9ab9cc
 
 
11e4e4e
b9ab9cc
 
 
11e4e4e
b9ab9cc
 
 
11e4e4e
b9ab9cc
 
b51376f
a8fa7ff
b9ab9cc
 
 
a8fa7ff
b9ab9cc
 
 
1c632d8
b9ab9cc
 
 
1c632d8
6dfb44c
b9ab9cc
 
1c632d8
b9ab9cc
 
a8fa7ff
1c632d8
b9ab9cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Export

The original model was exported using the following process:

The following repos were used:
* https://github.com/pdufour/Native-LLM-for-Android
* https://github.com/pdufour/transformers.js/tree/add-block-list

If you close this repo and the above 2 to the same directory you can run the following commands:

**From `Qwen2-VL-2B-Instruct-ONNX-Q4-F16`, run:**

`make all-in-one`

This will create an export of the onnx models.

**The following is a list of all commands available:**

**all-in-one**

Runs all steps (exporting, slimming, quantizing, cleaning, fixing GPU buffers) to produce fully prepared ONNX models.

**export**

Combines export-abcd and export-e to generate ONNX models for all parts.

**export-abcd**

Exports model parts A, B, C, and D by running QwenVL_Export_ABCD.py.

**export-e**

Exports model part E by running QwenVL_Export_E.py.

**slim**

Reduces ONNX model size by removing unnecessary elements for optimized deployment.

**quantize**

Quantizes all model parts (A, B, C, D, and E) to optimize size and performance.

**quantize-%**

Quantizes a specific model part (% can be A, B, C, D, or E) with targeted configurations.

**clean-large-files**

Deletes ONNX files larger than 2GB from the destination directory to retain models that will work for onnx environments.

**fix-gpu-buffers**

Applies fixes to GPU buffers in ONNX files for part E to ensure GPU memory compatibility.

**all**

Alias for all-in-one to run the full ONNX model preparation pipeline.