3.75bpw quant.

by Nexesenex - opened Jan 1

Jan 1

•

Hey!
Could you make a 3.75bpw exl2-2 quant of this Llama2 70b please?
I run a 3090+3060 config, and I'd be curious to test the best quant of the base Llama 70b I can run at a decent context (Alpha 2, 6912 ctx) with your new optimized quantization method.
Happy new year 2024, and thanks for providing fast inference at the best quantization quality available for the greatest number of people!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment