The new checkpoints performance comparasion

by Davidliudev - opened Mar 30, 2023

Mar 30, 2023

•

edited Mar 30, 2023

Different from the author's notes, for me looks like the one with group size is worse than the one without group size with the same parameter setup. Not sure if i got something wrong
Wondering if anyone know the reason?

And also curious if the new model quality is better or worse than the previous .pt model.

elinas

Owner Mar 30, 2023

•

edited Mar 30, 2023

In evaluations, lower = better. So the group size 128 is marginally better than the un-grouped version.

Or are you referring to actual inference results on your end?

Also, both the grouped and un-grouped should be better than the original model due to the implementation of true sequential quantization. Unfortunately I did not save the original evals so I can't provide a comparison.

Davidliudev

Mar 30, 2023

Thanks. It seemed that the max new tokens affected the quality.
I accidentally set it to 2000 and the result messed up.
I change it to 200 for both and inference seemed ok now.

Oh another thing i noticed that is for ungrouped and the original model I never get OOM error (I use a 4090 with 24 GB VRAM).
But with grouped ones I might sometimes get OOM.
Maybe it eats more VRAM....

Anyway thanks for your clarification elinas

elinas

Owner Mar 30, 2023

Yes as mentioned in the README and the file size itself, it will use 1GB more of VRAM by default, so 18GB without any context.

elinas changed discussion status to closed Mar 30, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment