|
--- |
|
datasets: |
|
- togethercomputer/RedPajama-Data-1T-Sample |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- text-generation-inference |
|
--- |
|
This is [Llama2-22b](https://huggingface.co/chargoddard/llama2-22b) by [chargoddard](https://huggingface.co/chargoddard) in a couple of GGML formats. I have no idea what I'm doing so if something doesn't work as it should or not at all that's likely on me, not the models themselves.<br> |
|
A second model merge has been [released](https://huggingface.co/chargoddard/llama2-22b-blocktriangular) and the GGML conversions for that can be found [here](https://huggingface.co/IHaveNoClueAndIMustPost/llama2-22b-blocktriangular-GGML). |
|
|
|
While I haven't had any issues so far do note that the original repo states <i>"Not intended for use as-is - this model is meant to serve as a base for further tuning"</b>. |
|
|
|
Approximate VRAM requirements at 4K context: |
|
<table style='border: 2px #000000 solid; width: 50%' align='left' border='2'> |
|
<tbody> |
|
<tr> |
|
<td style='text-align: center'>MODEL</td> |
|
<td style='text-align: center'>SIZE</td> |
|
<td style='text-align: center'>VRAM</td> |
|
</tr> |
|
<tr> |
|
<td style='text-align: center'>q5_1</td> |
|
<td style='text-align: center'>16.4GB</td> |
|
<td style='text-align: center'>21.5GB</td> |
|
</tr> |
|
<tr> |
|
<td style='text-align: center'>q4_K_M</td> |
|
<td style='text-align: center'>13.2GB</td> |
|
<td style='text-align: center'>18.3GB</td> |
|
</tr> |
|
<tr> |
|
<td style='text-align: center'>q3_K_M</td> |
|
<td style='text-align: center'>10.6GB</td> |
|
<td style='text-align: center'>16.1GB</td> |
|
</tr> |
|
<tr> |
|
<td style='text-align: center'>q2_K</td> |
|
<td style='text-align: center'>9.2GB</td> |
|
<td style='text-align: center'>14.5GB</td> |
|
</tr> |
|
</tbody> |
|
</table> |
|
|