Series of models to test the benefits of CoreML joint compression on iOS 18/macOS 15.
mlp-*.mlpackage
Simple Up/Gate/Silu/Down MLP repeated four times with the Llama 2 7B dimensions.
All using 'CPU and Neural Engine' compute unit, measured in Xcode.
Device | Model | Precision | Minimum (ms) | Median (ms) |
---|---|---|---|---|
M1 Max | mlp-float16 | float16 | 19.30 | 19.42 |
M1 Max | mlp-4bit | 4-bit LUT | 5.93 | 5.98 |
M1 Max | mlp-2bit | 2-bit LUT | 5.92 | 6.11 |
M1 Max | mlp-4bit-int8 | 4-bit int8 LUT + A8 | 6.02 | 6.31 |
M1 Max | mlp-2bit-int8 | 2-bit int8 LUT + A8 | 6.00 | 6.18 |
M1 Max | mlp-int8-int8 | W8A8 | 9.78 | 9.94 |
M4 | mlp-4bit | 4-bit LUT | - | 4.19 |
M4 | mlp-2bit | 2-bit LUT | - | 3.83 |
M4 | mlp-4bit-int8 | 4-bit int8 LUT + A8 | - | 4.14 |
M4 | mlp-2bit-int8 | 2-bit int8 LUT + A8 | - | 3.83 |
M4 | mlp-int8-int8 | W8A8 | - | 8.18 |
Download
huggingface-cli download \
--local-dir . \
--local-dir-use-symlinks False \
smpanaro/coreml-joint-compression-test \
--include "*.mlpackage/*"
- Downloads last month
- 0