--- license: gpl datasets: - nomic-ai/gpt4all-j-prompt-generations language: - en inference: false --- # GPT4All-13B-snoozy-GGML These files are GGML format model files of [Nomic.AI's GPT4all-13B-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy). GGML files are for CPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp). ## Repositories available * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GPTQ). * [4bit and 5bit GGML models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GGML). * [Nomic.AI's original model in float32 HF for GPU inference](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy). ## REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)! llama.cpp recently made a breaking change to its quantisation methods. I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them. The previous files, which will still work in older versions of llama.cpp, can be found in branch `previous_llama`. ## Provided files | Name | Quant method | Bits | Size | RAM required | Use case | | ---- | ---- | ---- | ---- | ---- | ----- | `GPT4All-13B-snoozy.q4_0.bin` | q4_0 | 4bit | 8.14GB | 10GB | 4-bit. | `GPT4All-13B-snoozy.q5_0.bin` | q5_0 | 5bit | 8.95GB | 11GB | 5-bit. Higher accuracy, higher resource usage and slower inference. | `GPT4All-13B-snoozy.q5_1.bin` | q5_1 | 5bit | 9.76GB | 12GB | 5-bit. Even higher accuracy, higher resource usage and slower inference. | ## How to run in `llama.cpp` I use the following command line; adjust for your tastes and needs: ``` ./main -t 12 -m GPT4All-13B-snoozy.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Write a story about llamas ### Response:" ``` Change `-t 12` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`. If you want to have a chat-style conversation, replace the `-p ` argument with `-i -ins` ## How to run in `text-generation-webui` Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md). Note: at this time text-generation-webui will not support the newly updated llama.cpp quantisation methods. **Thireus** has written a [great guide on how to update it to the latest llama.cpp code](https://huggingface.co/TheBloke/wizardLM-7B-GGML/discussions/5) which may help get the newly updated llama.cpp quantisation methods working in text-gen-ui sooner. ## Repositories available * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GPTQ). * [4bit and 5bit GGML models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GGML). * [Nomic.AI's original model in float32 HF for GPU inference](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy). # Original Model Card for GPT4All-13b-snoozy An Apache-2 licensed chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories. ## Model Details ### Model Description This model has been finetuned from LLama 13B - **Developed by:** [Nomic AI](https://home.nomic.ai) - **Model Type:** A finetuned LLama 13B model on assistant style interaction data - **Language(s) (NLP):** English - **License:** Apache-2 - **Finetuned from model [optional]:** LLama 13B This model was trained on `nomic-ai/gpt4all-j-prompt-generations` using `revision=v1.3-groovy` ### Model Sources [optional] - **Repository:** [https://github.com/nomic-ai/gpt4all](https://github.com/nomic-ai/gpt4all) - **Base Model Repository:** [https://github.com/facebookresearch/llama](https://github.com/facebookresearch/llama) - **Demo [optional]:** [https://gpt4all.io/](https://gpt4all.io/) ### Results Results on common sense reasoning benchmarks ``` Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA ----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ---------- GPT4All-J 6B v1.0 73.4 74.8 63.4 64.7 54.9 36.0 40.2 GPT4All-J v1.1-breezy 74.0 75.1 63.2 63.6 55.4 34.9 38.4 GPT4All-J v1.2-jazzy 74.8 74.9 63.6 63.8 56.6 35.3 41.0 GPT4All-J v1.3-groovy 73.6 74.3 63.8 63.5 57.7 35.0 38.8 GPT4All-J Lora 6B 68.6 75.8 66.2 63.5 56.4 35.7 40.2 GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2 GPT4All 13B snoozy *83.3* 79.2 75.0 *71.3* 60.9 44.2 43.4 Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2 Dolly 12B 56.7 75.4 71.0 62.2 *64.6* 38.5 40.4 Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4 Alpaca Lora 7B 74.3 *79.3* 74.0 68.8 56.6 43.9 42.6 GPT-J 6B 65.4 76.2 66.2 64.1 62.2 36.6 38.2 LLama 7B 73.1 77.4 73.0 66.9 52.5 41.4 42.4 LLama 13B 68.5 79.1 *76.2* 70.1 60.0 *44.6* 42.2 Pythia 6.9B 63.5 76.3 64.0 61.1 61.3 35.2 37.2 Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38.0 Vicuña T5 81.5 64.6 46.3 61.8 49.3 33.3 39.4 Vicuña 13B 81.5 76.8 73.3 66.7 57.4 42.7 43.6 Stable Vicuña RLHF 82.3 78.6 74.1 70.9 61.0 43.5 *44.4* StableLM Tuned 62.5 71.2 53.6 54.8 52.4 31.1 33.4 StableLM Base 60.1 67.4 41.2 50.1 44.9 27.0 32.0 ```