---
datasets:
- togethercomputer/RedPajama-Data-1T-Sample
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
---
This is [Llama2-22b](https://huggingface.co/chargoddard/llama2-22b) by [chargoddard](https://huggingface.co/chargoddard) in a couple of GGML formats. I have no idea what I'm doing so if something doesn't work as it should or not at all that's likely on me, not the models themselves.
A second model merge has been [released](https://huggingface.co/chargoddard/llama2-22b-blocktriangular) and the GGML conversions for that can be found [here](https://huggingface.co/IHaveNoClueAndIMustPost/llama2-22b-blocktriangular-GGML).
While I haven't had any issues so far do note that the original repo states "Not intended for use as-is - this model is meant to serve as a base for further tuning".
Approximate VRAM requirements at 4K context:
MODEL |
SIZE |
VRAM |
q5_1 |
16.4GB |
21.5GB |
q4_K_M |
13.2GB |
18.3GB |
q3_K_M |
10.6GB |
16.1GB |
q2_K |
9.2GB |
14.5GB |