gpt2-quantzed-gguf / README.md
kyrylokumar's picture
Added extra files
35e23cc verified
|
raw
history blame
3.62 kB
## Part 1
Normal model
Memory usage of model alone = 510.342192
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 838.783488
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:25<00:00, 18.97it/s]
Loss = 26.38488006591797
Time taken: 25.795103549957275
Full model quant
Memory usage of model alone = 294.250369
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1465.776128
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:21<00:00, 22.39it/s]
Loss = 26.954803466796875
Time taken: 21.855380058288574
Full model without lm_head
Memory usage of model alone = 255.602736
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1269.30176
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:21<00:00, 22.68it/s]
Loss = 26.41402816772461
Time taken: 21.578929662704468
Only LM head
Memory usage of model alone = 548.989825
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1036.319744
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:20<00:00, 23.39it/s]
Loss = 26.924053192138672
Time taken: 20.919220209121704
Last 4 attention layers
Memory usage of model alone = 425.42904
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 983.949824
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:20<00:00, 23.40it/s]
Loss = 26.39584732055664
Time taken: 20.912957668304443
Only q,k,v
Memory usage of model alone = 425.425968
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 989.827584
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:21<00:00, 23.11it/s]
Loss = 26.396583557128906
Time taken: 21.17274236679077
## Part 2:
4 bit model
Memory usage of model alone = 134.060568
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 308.803072
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:16<00:00, 29.78it/s]
Loss = 31.296875
Time taken: 16.42749333381653
`low_cpu_mem_usage` was None, now set to True since model is quantized.
8 bit model
Memory usage of model alone = 176.527896
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.142976
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:29<00:00, 16.70it/s]
Loss = 26.5625
Time taken: 29.27569341659546
`low_cpu_mem_usage` was None, now set to True since model is quantized.
4 bit nf4 model
Memory usage of model alone = 134.060568
0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.85824
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:15<00:00, 30.64it/s]
Loss = 28.375
Time taken: 15.961309671401978