melhoushi commited on
Commit
7aabe27
1 Parent(s): 23a18d6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +218 -3
README.md CHANGED
@@ -1,3 +1,218 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SuperBlock
2
+
3
+ SuperBlock combines two techniques for efficient neural network training and inference: Supermask and Block Compressed Sparse Row (BSR)
4
+
5
+ ### Supermask
6
+ [Supermask](https://arxiv.org/abs/2207.00670) is a technique for applying structured sparsity to neural networks using a learned mask. It works by learning a continuous mask (scores) that is applied element-wise to the weights of a neural network layer. The mask scores are learned separately from the weights and are thresholded based on a target sparsity level to obtain a binary mask. The mask determines which weigths are kept and which are pruned, and is learned during training.
7
+
8
+ During inference, the binary mask is applied element-wise to the weights, pruning the weights that correspond to a 0 in the mask, resulting in a sparse network that can be efficiently computed.
9
+
10
+ ### Block compressed Sparse Row Format (BSR)
11
+ [The BSR format](https://pytorch.org/docs/main/sparse.html#sparse-bsr-tensor) is a sparse matrix representation that stores dense sub-blocks of non-zero elements instead of individual non-zero elements. The matrix is divided into equal-sized blocks, and only the non-zero blocks are stored.
12
+
13
+ The BSR format is efficient for sparse matrices with a block structure, where non-zero elements tend to cluster in dense sub-blocks. It reduces storage requirements and enables efficient matrix operations on the non-zero blocks.
14
+
15
+ Currently, the BSR format is optimized for Nvidia A100 GPU(s) only.
16
+
17
+ ## Setup
18
+ To use SuperBlock, you will need
19
+ * [PyTorch](https://pytorch.org/get-started/locally/)
20
+
21
+ To train the model or evaluate accuracy, you will need:
22
+ * ImageNet2012-blurred dataset
23
+
24
+ At least one GPU:
25
+ * A100 or H100
26
+
27
+ ## Installation
28
+ * Clone this repo
29
+ ```
30
+ git clone https://github.com/pytorch-labs/superblock.git
31
+ cd superblock
32
+ ```
33
+ * Create a new conda environment
34
+ ```
35
+ conda create -n superblock
36
+ conda activate superblock
37
+ ```
38
+ * Install PyTorch. For best performance, we recommend `2.3.0.dev20240305+cu121` nightly
39
+ ```
40
+ pip install --pre torch==2.3.0.dev20240305+cu121 --index-url https://download.pytorch.org/whl/nightly/cu121
41
+ pip install --pre torchvision==0.18.0 --no-deps
42
+ ```
43
+
44
+
45
+ ## Benchmarking
46
+ Baseline:
47
+ ```
48
+ python benchmark.py \
49
+ --model vit_b_16 \
50
+ --batch-size 256 \
51
+ > /dev/null
52
+ ```
53
+ Result:
54
+ ```
55
+ 532.1160546875 ms
56
+ ```
57
+
58
+
59
+ 80% sparsity, block size 64 (random weights):
60
+ ```
61
+ python benchmark.py --model vit_b_16 \
62
+ --batch-size 256 \
63
+ --sparsity-linear 0.8 \
64
+ --sp-linear-tile-size 64 \
65
+ --sparsify-weights \
66
+ --bsr 64 \
67
+ > /dev/null
68
+ ```
69
+ Result:
70
+ ```
71
+ 393.864453125 ms
72
+ ```
73
+
74
+
75
+ ## Training
76
+ Please refer to [TRAINING.md](TRAINING.md) for training from scratch. We use [Torchvision](https://github.com/pytorch/vision/tree/main/references/classification) as our framework for training. Supermask can be applied during training.
77
+
78
+ To apply supermask, we have the following arguments at our disposal,
79
+
80
+ * Apply Supermask to linear layers:
81
+ ```
82
+ --sparsity-linear
83
+ --sp-linear-tile-size
84
+ ```
85
+ * Apply Supermask to conv1x1 layers:
86
+ ```
87
+ --sparsity-conv1x1
88
+ --sp-conv1x1-tile-size
89
+ ```
90
+ * Apply Supermask to all other convolutional layers:
91
+ ```
92
+ --sparsity-conv
93
+ --sp-conv-tile-size
94
+ ```
95
+ * Skip the first transformer layer and/or last linear layer (ViT only):
96
+ ```
97
+ --skip-last-layer-sparsity
98
+ --skip-first-transformer-sparsity
99
+ ```
100
+
101
+ For example, if you would like to train a `vit_b_16` from scratch using Supermask, you can use the respective torchvision command found in [TRAINING.md](TRAINING.md) and append the supermask arguments:
102
+ ```
103
+ torchrun --nproc_per_node=8 train.py\
104
+ --model vit_b_16 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
105
+ --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
106
+ --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
107
+ --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema\
108
+ --sparsity-linear 0.9 --sp-linear-tile-size 32
109
+ ```
110
+ Through this command, we are training a `vit_b_16` with 90% sparsity to linear layers using 32x32 tiles.
111
+
112
+ Please run `python train.py --help` for a full list of available arguments.
113
+
114
+ ## Evaluation
115
+
116
+ To run an evaluation of a Supermask-trained model, you can use [evaluate.py](evaluate.py). Our current version has signficant speedup with float32 only and not float16, hence, to illustrate speedup, we don't pass `--amp` in the example commands below.
117
+
118
+ ```
119
+ MODEL_PATH=<put the path of the trained checkpoint here>
120
+ IMAGENET_PATH=<put the path of ImageNet dataset here>
121
+ NGPUS=1 # put number of available GPUS here
122
+ ```
123
+
124
+ * Offline sparsification with BSR:
125
+ ```
126
+ torchrun --nproc_per_node=${NGPUS} evaluate.py --model vit_b_16 --batch-size 256 --sparsity-linear 0.9 --sp-linear-tile-size 32 --weights-path ${MODEL_PATH} --data-path ${IMAGENET_PATH} --sparsify-weights --bsr 32
127
+ ```
128
+ This command applies 90% sparsity to linear layers using 32x32 tiles, loads the model weights from ${MODEL_PATH}, loads the ImageNet validation set located at the specified path, applies offline sparsification to the weights, and converts the sparse weights to BSR format with a block size of 32. It is recommended to set `--bsr` the same as tile size.
129
+
130
+ * Online sparsification without BSR:
131
+ ```
132
+ torchrun --nproc_per_node=${NGPUS} evaluate.py --model vit_b_16 --batch-size 256 --sparsity-linear 0.9 --sp-linear-tile-size 32 --weights-path ${MODEL_PATH} --data-path ${IMAGENET_PATH}
133
+ ```
134
+ This is similar to the previous command, but it does not apply offline sparsification or BSR conversion. Instead, the sparsity is applied on-the-fly during evaluation.
135
+
136
+ Please run `python evaluate.py --help` for a full list of available arguments.
137
+
138
+ Results (1x A100):
139
+ * Baseline
140
+ ```
141
+ Test: Total time: 0:02:11
142
+ Test: Acc@1 78.392 Acc@5 93.592
143
+ ```
144
+
145
+ * Sparsity= 0.9, Tile Size = 32, Online Sparsification, BSR = None
146
+ ```
147
+ Test: Total time: 0:01:52
148
+ Test: Acc@1 76.092 Acc@5 92.656
149
+ ```
150
+
151
+ * Sparsity= 0.9, Tile Size = 32, Offline Sparsification, BSR = None
152
+ ```
153
+ Test: Total time: 0:01:54
154
+ Test: Acc@1 76.092 Acc@5 92.656
155
+ ```
156
+
157
+ * Sparsity= 0.9, Tile Size = 32, Offline Sparsification, BSR = 32
158
+ ```
159
+ Test: Total time: 0:01:25
160
+ Test: Acc@1 76.092 Acc@5 92.656
161
+ ```
162
+
163
+ ## Pretrained Weights
164
+
165
+ ### Download:
166
+ Instead of training from scratch, if you'd like to use the Supermask weights of `vit_b_16` trained on privacy mitigated Imagenet-blurred, you can download them here:
167
+ ```
168
+ SPARSITY=0.80 # Checkpoints available for: 0.70, 0.80, 0.82, 0.84, 0.86, 0.88, 0.90
169
+ BLOCK_SIZE=32 # Checkpoints available for: 16, 32, 64
170
+ ```
171
+
172
+ ```
173
+ mkdir checkpoints
174
+ # For baseline,
175
+ wget https://huggingface.co/facebook/superblock-vit-b-16/resolve/main/checkpoints/baseline.pth -P checkpoints/
176
+ # For sparsified checkpoints,
177
+ wget https://huggingface.co/facebook/superblock-vit-b-16/resolve/main/checkpoints/sp${SPARSITY}-ts${BLOCK_SIZE}.pth -P checkpoints/
178
+ ```
179
+
180
+ ### Benchmark:
181
+ ```
182
+ python benchmark.py --model vit_b_16 \
183
+ --batch-size 256 \
184
+ --sparsity-linear ${SPARSITY} \
185
+ --sp-linear-tile-size ${BLOCK_SIZE} \
186
+ --sparsify-weights \
187
+ --bsr ${BLOCK_SIZE} \
188
+ --weights-path ./checkpoints/superblock-vit-b-16-sp${SPARSITY}-ts${BLOCK_SIZE}.pth \
189
+ > /dev/null
190
+ ```
191
+ Result:
192
+ ```
193
+ 530.342578125 ms
194
+ ```
195
+
196
+ ### Evaluate:
197
+ 8 x A100 GPUs:
198
+ ```
199
+ torchrun --nproc_per_node=8 evaluate.py --model vit_b_16 --batch-size 256 --sparsity-linear ${SPARSITY} --sp-linear-tile-size ${BLOCK_SIZE} --bsr ${BLOCK_SIZE} --sparsify-weights --weights-path checkpoints/superblock-vit-b-16-sp${SPARSITY}-ts${BLOCK_SIZE}.pth --data-path ${IMAGENET_PATH}
200
+ ```
201
+ Result:
202
+ ```
203
+ Test: Total time: 0:01:01
204
+ Test: Acc@1 77.644 Acc@5 93.554
205
+ ```
206
+
207
+ 1 x A100 GPUs:
208
+ ```
209
+ torchrun --nproc_per_node=1 evaluate.py --model vit_b_16 --batch-size 256 --sparsity-linear ${SPARSITY} --sp-linear-tile-size ${BLOCK_SIZE} --bsr ${BLOCK_SIZE} --sparsify-weights --weights-path checkpoints/superblock-vit-b-16-sp${SPARSITY}-ts${BLOCK_SIZE}.pth --data-path ${IMAGENET_PATH}
210
+ ```
211
+ Result:
212
+ ```
213
+ Test: Total time: 0:01:51
214
+ Test: Acc@1 77.644 Acc@5 93.554
215
+ ```
216
+
217
+ ## License
218
+ SuperBlock is released under the [MIT license](https://github.com/pytorch-labs/superblock?tab=MIT-1-ov-file#readme).