metadata

language:
  - en
tags:
  - ggml
  - causal-lm
  - pythia
license: apache-2.0
datasets:
  - EleutherAI/the_pile_deduplicated

This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.

Click here if you're looking for ggmlv1 and ggmlv2 models..

RAM USAGE

Model	RAM usage
Unloaded	41.3 MiB

ggmlv3-pythia-70m-deduped-q4_0.bin	95.5 MiB
ggmlv3-pythia-160m-deduped-q4_0.bin	201.1 MiB
ggmlv3-pythia-410m-deduped-q4_0.bin	415.1 MiB
ggmlv3-pythia-1b-deduped-q4_0.bin	762.2 MiB
ggmlv3-pythia-1.4b-deduped-q4_0.bin	1.0 GiB
ggmlv3-pythia-2.8b-deduped-q4_0.bin	1.9 GiB

ggmlv3-pythia-70m-deduped-q5_1.bin	108.7 MiB
ggmlv3-pythia-160m-deduped-q5_1.bin	226.9 MiB
ggmlv3-pythia-410m-deduped-q5_1.bin	494.0 MiB
ggmlv3-pythia-1b-deduped-q5_1.bin	943.9 MiB
ggmlv3-pythia-1.4b-deduped-q5_1.bin	1.3 GiB
ggmlv3-pythia-2.8b-deduped-q5_1.bin	2.3 GiB

Tested on KoboldCpp with OpenBLAS enabled. Notes:

The models have been converted with ggerganov/ggml's gpt-neox conversion script, and tested only on KoboldCpp. Other frontends that support GGML-based conversions of GPT-NeoX should work, but I can't promise anything.
They're sorted by date based on when they were converted so it was easier to track breaking changes. If you're just starting off I highly recommend the latest, which is currently 2023-05-25. Combined with KoboldCpp v1.25.1+ this improved the tokenizer, which in my testing reduces occurrences of broken words like "Alicae" or "Reimu Hai-ku-rei".

ALTERNATIVES

If you're here because you want a smaller model to run on a device with constrained memory, consider the following, most (if not all) of which have GGML conversions available:

RedPajama-INCITE (3B, 7B), using the GPT-NeoX architecture
OpenLLaMA (3B, 7B), using the LLaMA architecture
MPT-1b-RedPajama-200b (1B), using the MPT architecture
RWKV-4 PilePlus (169M, 430M, 1.5B, 3B), using the RWKV architecture
GPT-2 (124M, 355M, 774M, 1.5B), using the GPT-2 architecture