TheBloke's picture
Update README.md
d1d65ea
|
raw
history blame
6.59 kB
metadata
license: gpl
datasets:
  - nomic-ai/gpt4all-j-prompt-generations
language:
  - en
inference: false

GPT4All-13B-snoozy-GGML

These files are GGML format model files of Nomic.AI's GPT4all-13B-snoozy.

GGML files are for CPU inference using llama.cpp.

Repositories available

REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!

llama.cpp recently made a breaking change to its quantisation methods.

I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee or later) to use them.

The previous files, which will still work in older versions of llama.cpp, can be found in branch previous_llama.

Provided files

Name Quant method Bits Size RAM required Use case
GPT4All-13B-snoozy.q4_0.bin q4_0 4bit 8.14GB 10GB 4-bit.
GPT4All-13B-snoozy.q5_0.bin q5_0 5bit 8.95GB 11GB 5-bit. Higher accuracy, higher resource usage and slower inference.
GPT4All-13B-snoozy.q5_1.bin q5_1 5bit 9.76GB 12GB 5-bit. Even higher accuracy, higher resource usage and slower inference.

How to run in llama.cpp

I use the following command line; adjust for your tastes and needs:

./main -t 12 -m GPT4All-13B-snoozy.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a story about llamas
### Response:"

Change -t 12 to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use -t 8.

If you want to have a chat-style conversation, replace the -p <PROMPT> argument with -i -ins

How to run in text-generation-webui

Further instructions here: text-generation-webui/docs/llama.cpp-models.md.

Note: at this time text-generation-webui will not support the newly updated llama.cpp quantisation methods.

Thireus has written a great guide on how to update it to the latest llama.cpp code which may help get the newly updated llama.cpp quantisation methods working in text-gen-ui sooner.

Repositories available

Original Model Card for GPT4All-13b-snoozy

An Apache-2 licensed chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories.

Model Details

Model Description

This model has been finetuned from LLama 13B

  • Developed by: Nomic AI
  • Model Type: A finetuned LLama 13B model on assistant style interaction data
  • Language(s) (NLP): English
  • License: Apache-2
  • Finetuned from model [optional]: LLama 13B

This model was trained on nomic-ai/gpt4all-j-prompt-generations using revision=v1.3-groovy

Model Sources [optional]

Results

Results on common sense reasoning benchmarks

  Model                     BoolQ       PIQA     HellaSwag   WinoGrande    ARC-e      ARC-c       OBQA
  ----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
  GPT4All-J 6B v1.0          73.4       74.8       63.4         64.7        54.9       36.0       40.2
  GPT4All-J v1.1-breezy      74.0       75.1       63.2         63.6        55.4       34.9       38.4
  GPT4All-J v1.2-jazzy       74.8       74.9       63.6         63.8        56.6       35.3       41.0
  GPT4All-J v1.3-groovy      73.6       74.3       63.8         63.5        57.7       35.0       38.8
  GPT4All-J Lora 6B          68.6       75.8       66.2         63.5        56.4       35.7       40.2
  GPT4All LLaMa Lora 7B      73.1       77.6       72.1         67.8        51.1       40.4       40.2
  GPT4All 13B snoozy        *83.3*      79.2       75.0        *71.3*       60.9       44.2       43.4
  Dolly 6B                   68.8       77.3       67.6         63.9        62.9       38.7       41.2
  Dolly 12B                  56.7       75.4       71.0         62.2       *64.6*      38.5       40.4
  Alpaca 7B                  73.9       77.2       73.9         66.1        59.8       43.3       43.4
  Alpaca Lora 7B             74.3      *79.3*      74.0         68.8        56.6       43.9       42.6
  GPT-J 6B                   65.4       76.2       66.2         64.1        62.2       36.6       38.2
  LLama 7B                   73.1       77.4       73.0         66.9        52.5       41.4       42.4
  LLama 13B                  68.5       79.1      *76.2*        70.1        60.0      *44.6*      42.2
  Pythia 6.9B                63.5       76.3       64.0         61.1        61.3       35.2       37.2
  Pythia 12B                 67.7       76.6       67.3         63.8        63.9       34.8       38.0
  Vicuña T5                  81.5       64.6       46.3         61.8        49.3       33.3       39.4
  Vicuña 13B                 81.5       76.8       73.3         66.7        57.4       42.7       43.6
  Stable Vicuña RLHF         82.3       78.6       74.1         70.9        61.0       43.5      *44.4*
  StableLM Tuned             62.5       71.2       53.6         54.8        52.4       31.1       33.4
  StableLM Base              60.1       67.4       41.2         50.1        44.9       27.0       32.0