Preparing to clean up the directory structure (inspired by xzuyn and TheBloke)

de81924 almost 2 years ago

3.58 kB

	---
	language:
	- en
	tags:
	- ggml
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- EleutherAI/the_pile_deduplicated
	---

	### This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.

	If you're starting off, I highly recommend for you to get models from the newest directory [(2023-05-25)](https://huggingface.co/Merry/ggml-pythia-deduped/tree/main/2023-05-25).

	[Click here if you're looking for ggmlv1 and ggmlv2 models.](https://huggingface.co/Merry/ggml-pythia-deduped/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd).

	# RAM USAGE
	Model \| RAM usage
	:--:\|:--:
	Unloaded \| 41.3 MiB
	\|
	ggmlv3-pythia-70m-deduped-q4_0.bin \| 95.5 MiB
	ggmlv3-pythia-160m-deduped-q4_0.bin \| 201.1 MiB
	ggmlv3-pythia-410m-deduped-q4_0.bin \| 415.1 MiB
	ggmlv3-pythia-1b-deduped-q4_0.bin \| 762.2 MiB
	ggmlv3-pythia-1.4b-deduped-q4_0.bin \| 1.0 GiB
	ggmlv3-pythia-2.8b-deduped-q4_0.bin \| 1.9 GiB
	\|
	ggmlv3-pythia-70m-deduped-q5_1.bin \| 108.7 MiB
	ggmlv3-pythia-160m-deduped-q5_1.bin \| 226.9 MiB
	ggmlv3-pythia-410m-deduped-q5_1.bin \| 494.0 MiB
	ggmlv3-pythia-1b-deduped-q5_1.bin \| 943.9 MiB
	ggmlv3-pythia-1.4b-deduped-q5_1.bin \| 1.3 GiB
	ggmlv3-pythia-2.8b-deduped-q5_1.bin \| 2.3 GiB

	Tested on KoboldCpp with OpenBLAS enabled.

	Versions:

	2023-04-20: q4_3. Used [commit 05f3079](https://github.com/ggerganov/ggml/tree/05f307971862b83df12fada0c42ee027ba5a82b5/examples/stablelm)

	2023-04-30: q5_0, q5_1, and q8_0, up to 2.8B. I can't upload all conversions of 6.9B and 12B due to my internet. Used [commit 5dd92f4](https://github.com/ggerganov/ggml/tree/5dd92f421ee44f18b8fde0afbf5ca8fc7bf93841/examples/stablelm)

	2023-05-06: q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm)

	2023-05-15: New quantization format (ggmlv2). q4_0 and q5_1, up to 2.8B. Used [commit 010203f](https://github.com/ggerganov/ggml/tree/010203f94a85df5c86b773dc5acb698c8e7b1e7b/examples/gpt-neox)

	2023-05-25: New quantization format (ggmlv3). q4_0 and q5_1, up to 2.8B. Used [commit 73ad593](https://github.com/ggerganov/ggml/tree/73ad593cf84f864f0fcfd3a196253575c70d66a2/examples/gpt-neox)

	Notes:
	- The models have been converted with ggerganov/ggml's gpt-neox conversion script, and tested only on KoboldCpp. Other frontends that support GGML-based conversions of GPT-NeoX should work, but I can't promise anything.
	- They're sorted by date based on when they were converted so it was easier to track breaking changes. If you're just starting off I highly recommend the latest, which is currently 2023-05-25. Combined with KoboldCpp v1.25.1+ this improved the tokenizer, which in my testing reduces occurrences of broken words like "Alicae" or "Reimu Hai-ku-rei".

	# ALTERNATIVES
	If you're here because you want a smaller model to run on a device with constrained memory, consider the following, most (if not all) of which have GGML conversions available:
	- [RedPajama-INCITE](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) (3B, 7B), using the GPT-NeoX architecture
	- [OpenLLaMA](https://huggingface.co/openlm-research/open_llama_3b_600bt_preview) (3B, 7B), using the LLaMA architecture
	- [MPT-1b-RedPajama-200b](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) (1B), using the MPT architecture
	- [RWKV-4 PilePlus](https://huggingface.co/BlinkDL/rwkv-4-pileplus) (169M, 430M, 1.5B, 3B), using the RWKV architecture
	- [GPT-2](https://huggingface.co/gpt2-xl) (124M, 355M, 774M, 1.5B), using the GPT-2 architecture

	---
	language:
	- en
	tags:
	- ggml
	- causal-lm
	- pythia
	license: apache-2.0
	datasets:
	- EleutherAI/the_pile_deduplicated
	---

	### This repository contains quantized conversions of EleutherAI's Pythia Deduped checkpoints.

	If you're starting off, I highly recommend for you to get models from the newest directory [(2023-05-25)](https://huggingface.co/Merry/ggml-pythia-deduped/tree/main/2023-05-25).

	[Click here if you're looking for ggmlv1 and ggmlv2 models.](https://huggingface.co/Merry/ggml-pythia-deduped/tree/a695a4c30c01ed9a41200c01f85d47c819fc93dd).

	# RAM USAGE
	Model \| RAM usage
	:--:\|:--:
	Unloaded \| 41.3 MiB
	\|
	ggmlv3-pythia-70m-deduped-q4_0.bin \| 95.5 MiB
	ggmlv3-pythia-160m-deduped-q4_0.bin \| 201.1 MiB
	ggmlv3-pythia-410m-deduped-q4_0.bin \| 415.1 MiB
	ggmlv3-pythia-1b-deduped-q4_0.bin \| 762.2 MiB
	ggmlv3-pythia-1.4b-deduped-q4_0.bin \| 1.0 GiB
	ggmlv3-pythia-2.8b-deduped-q4_0.bin \| 1.9 GiB
	\|
	ggmlv3-pythia-70m-deduped-q5_1.bin \| 108.7 MiB
	ggmlv3-pythia-160m-deduped-q5_1.bin \| 226.9 MiB
	ggmlv3-pythia-410m-deduped-q5_1.bin \| 494.0 MiB
	ggmlv3-pythia-1b-deduped-q5_1.bin \| 943.9 MiB
	ggmlv3-pythia-1.4b-deduped-q5_1.bin \| 1.3 GiB
	ggmlv3-pythia-2.8b-deduped-q5_1.bin \| 2.3 GiB

	Tested on KoboldCpp with OpenBLAS enabled.

	Versions:

	2023-04-20: q4_3. Used [commit 05f3079](https://github.com/ggerganov/ggml/tree/05f307971862b83df12fada0c42ee027ba5a82b5/examples/stablelm)

	2023-04-30: q5_0, q5_1, and q8_0, up to 2.8B. I can't upload all conversions of 6.9B and 12B due to my internet. Used [commit 5dd92f4](https://github.com/ggerganov/ggml/tree/5dd92f421ee44f18b8fde0afbf5ca8fc7bf93841/examples/stablelm)

	2023-05-06: q4_0 and q4_2, up to 2.8B. Used [commit ff6e03c](https://github.com/ggerganov/ggml/tree/ff6e03cbcd9bf6e9fa41d49f2495c042efae4dc6/examples/stablelm)

	2023-05-15: New quantization format (ggmlv2). q4_0 and q5_1, up to 2.8B. Used [commit 010203f](https://github.com/ggerganov/ggml/tree/010203f94a85df5c86b773dc5acb698c8e7b1e7b/examples/gpt-neox)

	2023-05-25: New quantization format (ggmlv3). q4_0 and q5_1, up to 2.8B. Used [commit 73ad593](https://github.com/ggerganov/ggml/tree/73ad593cf84f864f0fcfd3a196253575c70d66a2/examples/gpt-neox)

	Notes:
	- The models have been converted with ggerganov/ggml's gpt-neox conversion script, and tested only on KoboldCpp. Other frontends that support GGML-based conversions of GPT-NeoX should work, but I can't promise anything.
	- They're sorted by date based on when they were converted so it was easier to track breaking changes. If you're just starting off I highly recommend the latest, which is currently 2023-05-25. Combined with KoboldCpp v1.25.1+ this improved the tokenizer, which in my testing reduces occurrences of broken words like "Alicae" or "Reimu Hai-ku-rei".

	# ALTERNATIVES
	If you're here because you want a smaller model to run on a device with constrained memory, consider the following, most (if not all) of which have GGML conversions available:
	- [RedPajama-INCITE](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1) (3B, 7B), using the GPT-NeoX architecture
	- [OpenLLaMA](https://huggingface.co/openlm-research/open_llama_3b_600bt_preview) (3B, 7B), using the LLaMA architecture
	- [MPT-1b-RedPajama-200b](https://huggingface.co/mosaicml/mpt-1b-redpajama-200b) (1B), using the MPT architecture
	- [RWKV-4 PilePlus](https://huggingface.co/BlinkDL/rwkv-4-pileplus) (169M, 430M, 1.5B, 3B), using the RWKV architecture
	- [GPT-2](https://huggingface.co/gpt2-xl) (124M, 355M, 774M, 1.5B), using the GPT-2 architecture