Part 1
Normal model Memory usage of model alone = 510.342192 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 838.783488 100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:25<00:00, 18.97it/s] Loss = 26.38488006591797 Time taken: 25.795103549957275
Full model quant Memory usage of model alone = 294.250369 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1465.776128 100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:21<00:00, 22.39it/s] Loss = 26.954803466796875 Time taken: 21.855380058288574
Full model without lm_head Memory usage of model alone = 255.602736 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1269.30176 100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:21<00:00, 22.68it/s] Loss = 26.41402816772461 Time taken: 21.578929662704468
Only LM head Memory usage of model alone = 548.989825 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1036.319744 100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:20<00:00, 23.39it/s] Loss = 26.924053192138672 Time taken: 20.919220209121704
Last 4 attention layers Memory usage of model alone = 425.42904 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 983.949824 100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:20<00:00, 23.40it/s] Loss = 26.39584732055664 Time taken: 20.912957668304443
Only q,k,v Memory usage of model alone = 425.425968 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 989.827584 100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:21<00:00, 23.11it/s] Loss = 26.396583557128906 Time taken: 21.17274236679077
Part 2:
4 bit model Memory usage of model alone = 134.060568 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 308.803072 100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:16<00:00, 29.78it/s] Loss = 31.296875 Time taken: 16.42749333381653
low_cpu_mem_usage
was None, now set to True since model is quantized.
8 bit model
Memory usage of model alone = 176.527896
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.142976
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:29<00:00, 16.70it/s]
Loss = 26.5625
Time taken: 29.27569341659546
low_cpu_mem_usage
was None, now set to True since model is quantized.
4 bit nf4 model
Memory usage of model alone = 134.060568
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.85824
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:15<00:00, 30.64it/s]
Loss = 28.375
Time taken: 15.961309671401978