license: llama2
language:
- en
Information
This is a Exl2 quantized version of Psyfighter-2-13B-exl2
Please refer to the original creator for more information.
Calibration dataset: wikitext
Branches:
- main: Measurement files
- 4bpw: 4 bits per weight
- 5bpw: 5 bits per weight
- 6bpw: 6 bits per weight
Notes
- 6bpw is recommended for the best quality to vram usage ratio (assuming you have enough vram).
- Please ask for more bpws in the community tab if necessary.
- This model was quantized with permission from the model creator (Jeb Carter and the KoboldAI team)
- The FP16 (at the time of uploading) is not public, but the merge recipe is. I used that to create my FP16 for this set of exl2 quants. Nothing should be different from Kobold's FP16 version.
Run in TabbyAPI
TabbyAPI is a pure exllamav2 FastAPI server developed by us. You can find TabbyAPI's source code here: https://github.com/theroyallab/TabbyAPI
If you don't have huggingface-cli, please run pip install huggingface_hub
.
To run this model, follow these steps:
Make a directory inside your models folder called
Psyfighter-2-13B-exl2
Open a terminal inside your models folder
Run
huggingface-cli download royallab/Psyfighter-2-13B-exl2 --revision 4bpw --local-dir Psyfighter-2-13B-exl2 --local-dir-use-symlinks False
- The
--revision
flag corresponds to the branch name on the model repo. Please select the appropriate bpw branch for your system.
- The
Inside TabbyAPI's config.yml, set
model_name
toPsyfighter-2-13B-exl2
or you can use the/model/load
endpoint after launching.Launch TabbyAPI inside your python env by running
python main.py
Donate?
All my infrastructure and cloud expenses are paid out of pocket. If you'd like to donate, you can do so here: https://ko-fi.com/kingbri