---
language:
- en
tags:
- ggml
- causal-lm
- pythia
license: apache-2.0
datasets:
- EleutherAI/the_pile_deduplicated
---

### This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.

[Click here if you're looking for ggmlv1 and ggmlv2 models.](https://huggingface.co/Merry/ggml-pythia-deduped/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd).

# RAM USAGE
Model | RAM usage
:--:|:--:
Unloaded | 41.3 MiB
|
ggmlv3-pythia-70m-deduped-q4_0.bin | 95.5 MiB
ggmlv3-pythia-160m-deduped-q4_0.bin | 201.1 MiB
ggmlv3-pythia-410m-deduped-q4_0.bin | 415.1 MiB
ggmlv3-pythia-1b-deduped-q4_0.bin | 762.2 MiB
ggmlv3-pythia-1.4b-deduped-q4_0.bin | 1.0 GiB
ggmlv3-pythia-2.8b-deduped-q4_0.bin | 1.9 GiB
|
ggmlv3-pythia-70m-deduped-q5_1.bin | 108.7 MiB
ggmlv3-pythia-160m-deduped-q5_1.bin | 226.9 MiB
ggmlv3-pythia-410m-deduped-q5_1.bin | 494.0 MiB
ggmlv3-pythia-1b-deduped-q5_1.bin | 943.9 MiB
ggmlv3-pythia-1.4b-deduped-q5_1.bin | 1.3 GiB
ggmlv3-pythia-2.8b-deduped-q5_1.bin | 2.3 GiB

*Tested on KoboldCpp with OpenBLAS enabled.*
**Notes:**
- The models have been converted with ggerganov/ggml's gpt-neox conversion script, and tested only on KoboldCpp. Other frontends that support GGML-based conversions of GPT-NeoX *should* work, but I can't promise anything.
- They're sorted by date based on when they were converted so it was easier to track breaking changes. If you're just starting off I highly recommend the latest, which is currently 2023-05-25. Combined with KoboldCpp v1.25.1+ this improved the tokenizer, which in my testing reduces occurrences of broken words like "Alicae" or "Reimu Hai-ku-rei".

# ALTERNATIVES
If you're here because you want a smaller model to run on a device with constrained memory, consider the following, most (if not all) of which have GGML conversions available:
- [**RedPajama-INCITE**](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) (3B, 7B), using the GPT-NeoX architecture
- [**OpenLLaMA**](https://huggingface.co/openlm-research/open_llama_3b_600bt_preview) (3B, 7B), using the LLaMA architecture
- [**MPT-1b-RedPajama-200b**](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) (1B), using the MPT architecture
- [**RWKV-4 PilePlus**](https://huggingface.co/BlinkDL/rwkv-4-pileplus) (169M, 430M, 1.5B, 3B), using the RWKV architecture
- [**GPT-2**](https://huggingface.co/gpt2-xl) (124M, 355M, 774M, 1.5B), using the GPT-2 architecture