--- datasets: - togethercomputer/RedPajama-Data-1T-Sample library_name: transformers pipeline_tag: text-generation tags: - text-generation-inference --- This is [Llama2-22b](https://huggingface.co/chargoddard/llama2-22b) by [chargoddard](https://huggingface.co/chargoddard) in a couple of GGML formats. I have no idea what I'm doing so if something doesn't work as it should or not at all that's likely on me, not the models themselves.
A second model merge has been [released](https://huggingface.co/chargoddard/llama2-22b-blocktriangular) and the GGML conversions for that can be found [here](https://huggingface.co/IHaveNoClueAndIMustPost/llama2-22b-blocktriangular-GGML). While I haven't had any issues so far do note that the original repo states "Not intended for use as-is - this model is meant to serve as a base for further tuning". Approximate VRAM requirements at 4K context:

MODEL SIZE VRAM

q5_1 16.4GB 21.5GB

q4_K_M 13.2GB 18.3GB

q3_K_M 10.6GB 16.1GB

q2_K 9.2GB 14.5GB

MODEL	SIZE	VRAM
q5_1	16.4GB	21.5GB
q4_K_M	13.2GB	18.3GB
q3_K_M	10.6GB	16.1GB
q2_K	9.2GB	14.5GB