Series of models to test the benefits of CoreML joint compression on iOS 18/macOS 15.

mlp-*.mlpackage

Simple Up/Gate/Silu/Down MLP repeated four times with the Llama 2 7B dimensions.

All using 'CPU and Neural Engine' compute unit, measured in Xcode.

Device	Model	Precision	Minimum (ms)	Median (ms)
M1 Max	mlp-float16	float16	19.30	19.42
M1 Max	mlp-4bit	4-bit LUT	5.93	5.98
M1 Max	mlp-2bit	2-bit LUT	5.92	6.11
M1 Max	mlp-4bit-int8	4-bit int8 LUT + A8	6.02	6.31
M1 Max	mlp-2bit-int8	2-bit int8 LUT + A8	6.00	6.18
M1 Max	mlp-int8-int8	W8A8	9.78	9.94
M4	mlp-4bit	4-bit LUT	-	4.19
M4	mlp-2bit	2-bit LUT	-	3.83
M4	mlp-4bit-int8	4-bit int8 LUT + A8	-	4.14
M4	mlp-2bit-int8	2-bit int8 LUT + A8	-	3.83
M4	mlp-int8-int8	W8A8	-	8.18

Download

huggingface-cli download \
  --local-dir . \
  --local-dir-use-symlinks False \
  smpanaro/coreml-joint-compression-test \
  --include "*.mlpackage/*"