Papers
arxiv:2303.08308

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

Published on Mar 15, 2023
Authors:
,
,
,
,
,

Abstract

The combination of Neural Architecture Search (NAS) and <PRE_TAG>quantization</POST_TAG> has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latency on real-world devices leads to inferior performance. In this work, we find that the poor INT8 latency is due to the <PRE_TAG><PRE_TAG>quantization</POST_TAG>-unfriendly</POST_TAG> issue: the operator and configuration (e.g., channel width) choices in prior art search spaces lead to diverse <PRE_TAG>quantization</POST_TAG> efficiency and can slow down the INT8 inference <PRE_TAG>speed</POST_TAG>. To address this challenge, we propose SpaceEvo, an automatic method for designing a dedicated, <PRE_TAG><PRE_TAG>quantization</POST_TAG>-friendly search space</POST_TAG> for each target hardware. The key idea of SpaceEvo is to automatically search hardware-preferred <PRE_TAG>operators</POST_TAG> and configurations to construct the search space, guided by a metric called Q-T score to quantify how <PRE_TAG>quantization</POST_TAG>-friendly a candidate search space is. We further train a quantized-for-all supernet over our discovered search space, enabling the searched models to be directly deployed without extra retraining or <PRE_TAG>quantization</POST_TAG>. Our discovered models establish new SOTA INT8 quantized accuracy under various <PRE_TAG>latency constraints</POST_TAG>, achieving up to 10.1% accuracy improvement on ImageNet than prior art CNNs under the same latency. Extensive experiments on diverse edge devices demonstrate that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2303.08308 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2303.08308 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2303.08308 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.