|
## Part 1 |
|
|
|
Normal model |
|
Memory usage of model alone = 510.342192 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 838.783488 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:25<00:00, 18.97it/s] |
|
Loss = 26.38488006591797 |
|
Time taken: 25.795103549957275 |
|
|
|
Full model quant |
|
Memory usage of model alone = 294.250369 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1465.776128 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:21<00:00, 22.39it/s] |
|
Loss = 26.954803466796875 |
|
Time taken: 21.855380058288574 |
|
|
|
Full model without lm_head |
|
Memory usage of model alone = 255.602736 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1269.30176 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:21<00:00, 22.68it/s] |
|
Loss = 26.41402816772461 |
|
Time taken: 21.578929662704468 |
|
|
|
Only LM head |
|
Memory usage of model alone = 548.989825 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1036.319744 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:20<00:00, 23.39it/s] |
|
Loss = 26.924053192138672 |
|
Time taken: 20.919220209121704 |
|
|
|
Last 4 attention layers |
|
Memory usage of model alone = 425.42904 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 983.949824 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:20<00:00, 23.40it/s] |
|
Loss = 26.39584732055664 |
|
Time taken: 20.912957668304443 |
|
|
|
Only q,k,v |
|
Memory usage of model alone = 425.425968 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 989.827584 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:21<00:00, 23.11it/s] |
|
Loss = 26.396583557128906 |
|
Time taken: 21.17274236679077 |
|
|
|
|
|
## Part 2: |
|
4 bit model |
|
Memory usage of model alone = 134.060568 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 308.803072 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:16<00:00, 29.78it/s] |
|
Loss = 31.296875 |
|
Time taken: 16.42749333381653 |
|
|
|
`low_cpu_mem_usage` was None, now set to True since model is quantized. |
|
8 bit model |
|
Memory usage of model alone = 176.527896 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.142976 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:29<00:00, 16.70it/s] |
|
Loss = 26.5625 |
|
Time taken: 29.27569341659546 |
|
|
|
`low_cpu_mem_usage` was None, now set to True since model is quantized. |
|
4 bit nf4 model |
|
Memory usage of model alone = 134.060568 |
|
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.85824 |
|
100%|ββββββββββββββββββββββββββββββββββββββ| 489/491 [00:15<00:00, 30.64it/s] |
|
Loss = 28.375 |
|
Time taken: 15.961309671401978 |
|
|
|
|