license: gpl
datasets:
- nomic-ai/gpt4all-j-prompt-generations
language:
- en
inference: false
GPT4All-13B-snoozy-GGML
These files are GGML format model files of Nomic.AI's GPT4all-13B-snoozy.
GGML files are for CPU inference using llama.cpp.
Repositories available
- 4bit GPTQ models for GPU inference.
- 4bit and 5bit GGML models for GPU inference.
- Nomic.AI's original model in float32 HF for GPU inference.
REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
llama.cpp recently made a breaking change to its quantisation methods.
I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee
or later) to use them.
The previous files, which will still work in older versions of llama.cpp, can be found in branch previous_llama
.
Provided files
Name | Quant method | Bits | Size | RAM required | Use case |
---|---|---|---|---|---|
GPT4All-13B-snoozy.q4_0.bin |
q4_0 | 4bit | 8.14GB | 10GB | 4-bit. |
GPT4All-13B-snoozy.q5_0.bin |
q5_0 | 5bit | 8.95GB | 11GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
GPT4All-13B-snoozy.q5_1.bin |
q5_1 | 5bit | 9.76GB | 12GB | 5-bit. Even higher accuracy, higher resource usage and slower inference. |
How to run in llama.cpp
I use the following command line; adjust for your tastes and needs:
./main -t 12 -m GPT4All-13B-snoozy.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a story about llamas
### Response:"
Change -t 12
to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use -t 8
.
If you want to have a chat-style conversation, replace the -p <PROMPT>
argument with -i -ins
How to run in text-generation-webui
Further instructions here: text-generation-webui/docs/llama.cpp-models.md.
Note: at this time text-generation-webui will not support the newly updated llama.cpp quantisation methods.
Thireus has written a great guide on how to update it to the latest llama.cpp code which may help get the newly updated llama.cpp quantisation methods working in text-gen-ui sooner.
Repositories available
- 4bit GPTQ models for GPU inference.
- 4bit and 5bit GGML models for GPU inference.
- Nomic.AI's original model in float32 HF for GPU inference.
Original Model Card for GPT4All-13b-snoozy
An Apache-2 licensed chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories.
Model Details
Model Description
This model has been finetuned from LLama 13B
- Developed by: Nomic AI
- Model Type: A finetuned LLama 13B model on assistant style interaction data
- Language(s) (NLP): English
- License: Apache-2
- Finetuned from model [optional]: LLama 13B
This model was trained on nomic-ai/gpt4all-j-prompt-generations
using revision=v1.3-groovy
Model Sources [optional]
- Repository: https://github.com/nomic-ai/gpt4all
- Base Model Repository: https://github.com/facebookresearch/llama
- Demo [optional]: https://gpt4all.io/
Results
Results on common sense reasoning benchmarks
Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA
----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
GPT4All-J 6B v1.0 73.4 74.8 63.4 64.7 54.9 36.0 40.2
GPT4All-J v1.1-breezy 74.0 75.1 63.2 63.6 55.4 34.9 38.4
GPT4All-J v1.2-jazzy 74.8 74.9 63.6 63.8 56.6 35.3 41.0
GPT4All-J v1.3-groovy 73.6 74.3 63.8 63.5 57.7 35.0 38.8
GPT4All-J Lora 6B 68.6 75.8 66.2 63.5 56.4 35.7 40.2
GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2
GPT4All 13B snoozy *83.3* 79.2 75.0 *71.3* 60.9 44.2 43.4
Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2
Dolly 12B 56.7 75.4 71.0 62.2 *64.6* 38.5 40.4
Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4
Alpaca Lora 7B 74.3 *79.3* 74.0 68.8 56.6 43.9 42.6
GPT-J 6B 65.4 76.2 66.2 64.1 62.2 36.6 38.2
LLama 7B 73.1 77.4 73.0 66.9 52.5 41.4 42.4
LLama 13B 68.5 79.1 *76.2* 70.1 60.0 *44.6* 42.2
Pythia 6.9B 63.5 76.3 64.0 61.1 61.3 35.2 37.2
Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38.0
Vicuña T5 81.5 64.6 46.3 61.8 49.3 33.3 39.4
Vicuña 13B 81.5 76.8 73.3 66.7 57.4 42.7 43.6
Stable Vicuña RLHF 82.3 78.6 74.1 70.9 61.0 43.5 *44.4*
StableLM Tuned 62.5 71.2 53.6 54.8 52.4 31.1 33.4
StableLM Base 60.1 67.4 41.2 50.1 44.9 27.0 32.0