gpt2-quantzed-gguf / README.md
kyrylokumar's picture
Added extra files
35e23cc verified
|
raw
history blame
3.62 kB

Part 1

Normal model Memory usage of model alone = 510.342192 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 838.783488 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:25<00:00, 18.97it/s] Loss = 26.38488006591797 Time taken: 25.795103549957275

Full model quant Memory usage of model alone = 294.250369 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1465.776128 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:21<00:00, 22.39it/s] Loss = 26.954803466796875 Time taken: 21.855380058288574

Full model without lm_head Memory usage of model alone = 255.602736 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1269.30176 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:21<00:00, 22.68it/s] Loss = 26.41402816772461 Time taken: 21.578929662704468

Only LM head Memory usage of model alone = 548.989825 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 1036.319744 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:20<00:00, 23.39it/s] Loss = 26.924053192138672 Time taken: 20.919220209121704

Last 4 attention layers Memory usage of model alone = 425.42904 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 983.949824 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:20<00:00, 23.40it/s] Loss = 26.39584732055664 Time taken: 20.912957668304443

Only q,k,v Memory usage of model alone = 425.425968 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 989.827584 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:21<00:00, 23.11it/s] Loss = 26.396583557128906 Time taken: 21.17274236679077

Part 2:

4 bit model Memory usage of model alone = 134.060568 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 308.803072 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:16<00:00, 29.78it/s] Loss = 31.296875 Time taken: 16.42749333381653

low_cpu_mem_usage was None, now set to True since model is quantized. 8 bit model Memory usage of model alone = 176.527896 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.142976 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:29<00:00, 16.70it/s] Loss = 26.5625 Time taken: 29.27569341659546

low_cpu_mem_usage was None, now set to True since model is quantized. 4 bit nf4 model Memory usage of model alone = 134.060568 0%| | 0/491 [00:00<?, ?it/s]Memory usage at forward pass = 494.85824 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 489/491 [00:15<00:00, 30.64it/s] Loss = 28.375 Time taken: 15.961309671401978