Llama-2-22B-GGML / README.md
IHaveNoClueAndIMustPost's picture
Update README.md
5ce63d1
---
datasets:
- togethercomputer/RedPajama-Data-1T-Sample
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation-inference
---
This is [Llama2-22b](https://huggingface.co/chargoddard/llama2-22b) by [chargoddard](https://huggingface.co/chargoddard) in a couple of GGML formats. I have no idea what I'm doing so if something doesn't work as it should or not at all that's likely on me, not the models themselves.<br>
A second model merge has been [released](https://huggingface.co/chargoddard/llama2-22b-blocktriangular) and the GGML conversions for that can be found [here](https://huggingface.co/IHaveNoClueAndIMustPost/llama2-22b-blocktriangular-GGML).
While I haven't had any issues so far do note that the original repo states <i>"Not intended for use as-is - this model is meant to serve as a base for further tuning"</b>.
Approximate VRAM requirements at 4K context:
<table style='border: 2px #000000 solid; width: 50%' align='left' border='2'>
<tbody>
<tr>
<td style='text-align: center'>MODEL</td>
<td style='text-align: center'>SIZE</td>
<td style='text-align: center'>VRAM</td>
</tr>
<tr>
<td style='text-align: center'>q5_1</td>
<td style='text-align: center'>16.4GB</td>
<td style='text-align: center'>21.5GB</td>
</tr>
<tr>
<td style='text-align: center'>q4_K_M</td>
<td style='text-align: center'>13.2GB</td>
<td style='text-align: center'>18.3GB</td>
</tr>
<tr>
<td style='text-align: center'>q3_K_M</td>
<td style='text-align: center'>10.6GB</td>
<td style='text-align: center'>16.1GB</td>
</tr>
<tr>
<td style='text-align: center'>q2_K</td>
<td style='text-align: center'>9.2GB</td>
<td style='text-align: center'>14.5GB</td>
</tr>
</tbody>
</table>