Merry's picture
Preparing to clean up the directory structure (inspired by xzuyn and TheBloke)
de81924
|
raw
history blame
3.58 kB
metadata
language:
  - en
tags:
  - ggml
  - causal-lm
  - pythia
license: apache-2.0
datasets:
  - EleutherAI/the_pile_deduplicated

This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.

If you're starting off, I highly recommend for you to get models from the newest directory (2023-05-25).

Click here if you're looking for ggmlv1 and ggmlv2 models..

RAM USAGE

Model RAM usage
Unloaded 41.3 MiB
ggmlv3-pythia-70m-deduped-q4_0.bin 95.5 MiB
ggmlv3-pythia-160m-deduped-q4_0.bin 201.1 MiB
ggmlv3-pythia-410m-deduped-q4_0.bin 415.1 MiB
ggmlv3-pythia-1b-deduped-q4_0.bin 762.2 MiB
ggmlv3-pythia-1.4b-deduped-q4_0.bin 1.0 GiB
ggmlv3-pythia-2.8b-deduped-q4_0.bin 1.9 GiB
ggmlv3-pythia-70m-deduped-q5_1.bin 108.7 MiB
ggmlv3-pythia-160m-deduped-q5_1.bin 226.9 MiB
ggmlv3-pythia-410m-deduped-q5_1.bin 494.0 MiB
ggmlv3-pythia-1b-deduped-q5_1.bin 943.9 MiB
ggmlv3-pythia-1.4b-deduped-q5_1.bin 1.3 GiB
ggmlv3-pythia-2.8b-deduped-q5_1.bin 2.3 GiB

Tested on KoboldCpp with OpenBLAS enabled.

Versions:

2023-04-20: q4_3. Used commit 05f3079

2023-04-30: q5_0, q5_1, and q8_0, up to 2.8B. I can't upload all conversions of 6.9B and 12B due to my internet. Used commit 5dd92f4

2023-05-06: q4_0 and q4_2, up to 2.8B. Used commit ff6e03c

2023-05-15: New quantization format (ggmlv2). q4_0 and q5_1, up to 2.8B. Used commit 010203f

2023-05-25: New quantization format (ggmlv3). q4_0 and q5_1, up to 2.8B. Used commit 73ad593

Notes:

  • The models have been converted with ggerganov/ggml's gpt-neox conversion script, and tested only on KoboldCpp. Other frontends that support GGML-based conversions of GPT-NeoX should work, but I can't promise anything.
  • They're sorted by date based on when they were converted so it was easier to track breaking changes. If you're just starting off I highly recommend the latest, which is currently 2023-05-25. Combined with KoboldCpp v1.25.1+ this improved the tokenizer, which in my testing reduces occurrences of broken words like "Alicae" or "Reimu Hai-ku-rei".

ALTERNATIVES

If you're here because you want a smaller model to run on a device with constrained memory, consider the following, most (if not all) of which have GGML conversions available: