arxiv:2312.09411

OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

Published on Dec 15, 2023

Authors:

Abstract

Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm. This topic spans various techniques, from structured <PRE_TAG>pruning</POST_TAG> to neural architecture search, encompassing both pruning and erasing operators perspectives. Despite advancements, existing methods suffers from complex, multi-stage processes that demand substantial engineering and domain knowledge, limiting their broader applications. We introduce the third-generation Only-Train-Once (OTOv3), which first automatically trains and compresses a general DNN through pruning and erasing operations, creating a compact and competitive sub-network without the need of fine-tuning. OTOv3 simplifies and automates the training and compression process, minimizes the engineering efforts required from users. It offers key technological advancements: (i) automatic search space construction for general DNNs based on dependency graph analysis; (ii) Dual Half-Space Projected Gradient (DHSPG) and its enhanced version with hierarchical search (H2SPG) to reliably solve (hierarchical) structured sparsity problems and ensure sub-network validity; and (iii) automated sub-network construction using solutions from DHSPG/H2SPG and dependency graphs. Our empirical results demonstrate the efficacy of OTOv3 across various benchmarks in structured <PRE_TAG>pruning</POST_TAG> and neural architecture search. OTOv3 produces sub-networks that match or exceed the state-of-the-arts. The source code will be available at https://github.com/tianyic/only_train_once.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2312.09411 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.09411 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2312.09411 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.