File size: 6,594 Bytes
da3dac0
a88e72e
 
 
 
 
 
da3dac0
a88e72e
 
 
 
 
 
 
 
 
 
 
 
d1d65ea
a88e72e
d1d65ea
a88e72e
d1d65ea
a88e72e
d1d65ea
 
 
 
 
 
 
 
a88e72e
 
 
 
 
 
d1d65ea
a88e72e
 
 
 
 
 
 
 
 
 
 
 
d1d65ea
a88e72e
d1d65ea
a88e72e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
license: gpl
datasets:
- nomic-ai/gpt4all-j-prompt-generations
language:
- en
inference: false
---
# GPT4All-13B-snoozy-GGML

These files are GGML format model files of [Nomic.AI's GPT4all-13B-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy).

GGML files are for CPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).

## Repositories available

* [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GPTQ).
* [4bit and 5bit GGML models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GGML).
* [Nomic.AI's original model in float32 HF for GPU inference](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy).

## REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!

llama.cpp recently made a breaking change to its quantisation methods.

I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.

The previous files, which will still work in older versions of llama.cpp, can be found in branch `previous_llama`.
 
## Provided files
| Name | Quant method | Bits | Size | RAM required | Use case |
| ---- | ---- | ---- | ---- | ---- | ----- |
`GPT4All-13B-snoozy.q4_0.bin` | q4_0 | 4bit | 8.14GB | 10GB | 4-bit. |
`GPT4All-13B-snoozy.q5_0.bin` | q5_0 | 5bit | 8.95GB | 11GB | 5-bit. Higher accuracy, higher resource usage and slower inference.  |
`GPT4All-13B-snoozy.q5_1.bin` | q5_1 | 5bit | 9.76GB | 12GB | 5-bit. Even higher accuracy, higher resource usage and slower inference. |

## How to run in `llama.cpp`

I use the following command line; adjust for your tastes and needs:

```
./main -t 12 -m GPT4All-13B-snoozy.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a story about llamas
### Response:"
```
Change `-t 12` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.

If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`

## How to run in `text-generation-webui`

Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).

Note: at this time text-generation-webui will not support the newly updated llama.cpp quantisation methods.

**Thireus** has written a [great guide on how to update it to the latest llama.cpp code](https://huggingface.co/TheBloke/wizardLM-7B-GGML/discussions/5) which may help get the newly updated llama.cpp quantisation methods working in text-gen-ui sooner.

## Repositories available

* [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GPTQ).
* [4bit and 5bit GGML models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GGML).
* [Nomic.AI's original model in float32 HF for GPU inference](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy).
 

# Original Model Card for GPT4All-13b-snoozy

An Apache-2 licensed chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This model has been finetuned from LLama 13B

- **Developed by:** [Nomic AI](https://home.nomic.ai)
- **Model Type:** A finetuned LLama 13B model on assistant style interaction data
- **Language(s) (NLP):** English
- **License:** Apache-2
- **Finetuned from model [optional]:** LLama 13B

This model was trained on `nomic-ai/gpt4all-j-prompt-generations` using `revision=v1.3-groovy`

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/nomic-ai/gpt4all](https://github.com/nomic-ai/gpt4all)
- **Base Model Repository:** [https://github.com/facebookresearch/llama](https://github.com/facebookresearch/llama)
- **Demo [optional]:** [https://gpt4all.io/](https://gpt4all.io/)


### Results

Results on common sense reasoning benchmarks

```
  Model                     BoolQ       PIQA     HellaSwag   WinoGrande    ARC-e      ARC-c       OBQA
  ----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
  GPT4All-J 6B v1.0          73.4       74.8       63.4         64.7        54.9       36.0       40.2
  GPT4All-J v1.1-breezy      74.0       75.1       63.2         63.6        55.4       34.9       38.4
  GPT4All-J v1.2-jazzy       74.8       74.9       63.6         63.8        56.6       35.3       41.0
  GPT4All-J v1.3-groovy      73.6       74.3       63.8         63.5        57.7       35.0       38.8
  GPT4All-J Lora 6B          68.6       75.8       66.2         63.5        56.4       35.7       40.2
  GPT4All LLaMa Lora 7B      73.1       77.6       72.1         67.8        51.1       40.4       40.2
  GPT4All 13B snoozy        *83.3*      79.2       75.0        *71.3*       60.9       44.2       43.4
  Dolly 6B                   68.8       77.3       67.6         63.9        62.9       38.7       41.2
  Dolly 12B                  56.7       75.4       71.0         62.2       *64.6*      38.5       40.4
  Alpaca 7B                  73.9       77.2       73.9         66.1        59.8       43.3       43.4
  Alpaca Lora 7B             74.3      *79.3*      74.0         68.8        56.6       43.9       42.6
  GPT-J 6B                   65.4       76.2       66.2         64.1        62.2       36.6       38.2
  LLama 7B                   73.1       77.4       73.0         66.9        52.5       41.4       42.4
  LLama 13B                  68.5       79.1      *76.2*        70.1        60.0      *44.6*      42.2
  Pythia 6.9B                63.5       76.3       64.0         61.1        61.3       35.2       37.2
  Pythia 12B                 67.7       76.6       67.3         63.8        63.9       34.8       38.0
  Vicuña T5                  81.5       64.6       46.3         61.8        49.3       33.3       39.4
  Vicuña 13B                 81.5       76.8       73.3         66.7        57.4       42.7       43.6
  Stable Vicuña RLHF         82.3       78.6       74.1         70.9        61.0       43.5      *44.4*
  StableLM Tuned             62.5       71.2       53.6         54.8        52.4       31.1       33.4
  StableLM Base              60.1       67.4       41.2         50.1        44.9       27.0       32.0
```