File size: 12,822 Bytes
19d64c4
 
 
 
 
 
 
 
 
 
 
 
 
 
e8f40e6
 
 
19d64c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22540bf
 
 
 
 
 
19d64c4
 
 
 
 
 
 
 
 
 
 
 
 
22540bf
 
 
19d64c4
 
 
 
8074700
19d64c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
language:
- en
model_creator: SpectraSuite
quantized_by: jartine
pipeline_tag: text-generation
license: apache-2.0
license_link: LICENSE
tags:
- llamafile
---

# TriLM - llamafile

This is a 1.58 bit ternary LLM whose weights consist of {-1, 0, +1}.
It's highly optimized for CPU performance, thanks to the [`Q2_K_S`
quantization
format](https://github.com/Mozilla-Ocho/llamafile/pull/552).

- Model creator: [SpectraSuite](https://huggingface.co/SpectraSuite)
- Original model: [TriLMs-Unpacked](https://huggingface.co/collections/SpectraSuite/trilms-unpacked-668d5f62afe0f4036925b1d2)

This repository packages and distributes TriLM as executable weights,
which we call [llamafiles](https://github.com/Mozilla-Ocho/llamafile).
The files you download here will run on Linux, MacOS, Windows, FreeBSD,
OpenBSD, and NetBSD for AMD64 and ARM64.

## Quickstart

Running the following on a desktop OS will launch a tab in your web
browser with a completions interface.

```
wget https://huggingface.co/Mozilla/TriLM-llamafile/resolve/main/TriLM_3.9B.llamafile
chmod +x TriLM_3.9B.llamafile
./TriLM_3.9B.llamafile
```

You can also use the command line interface:

```
./TriLM_3.9B.llamafile -p "this is my prompt"
```

For further information, please see the [llamafile
README](https://github.com/mozilla-ocho/llamafile/).

Having **trouble?** See the ["Gotchas"
section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
of the README.

## Prompting

This is a base model. It hasn't been fine-tuned for chat. It's
recommended that the completions interface be used.

It's recommended with the smaller TriLM models (e.g. 99M) that a high
repeat penalty be set, e.g. `--repeat-penalty 10`. When using the CLI
mode, this flag is specified by default in the `.args` file embedded
within the llamafiles from this repo.

## Benchmarks

| cpu\_info                                      | model\_filename                          | size       | test          | t/s             |
| :-----------------------------------------     | :--------------------------------------- | ---------: | ------------: | --------------: |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_3.9B.llamafile                    | 1.31 GiB   | pp512         | 1069.54         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_3.9B.llamafile                    | 1.31 GiB   | tg16          | 88.47           |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_2.4B.llamafile                    | 837.02 MiB | pp512         | 1441.04         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_2.4B.llamafile                    | 837.02 MiB | tg16          | 110.80          |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_1.5B.llamafile                    | 531.44 MiB | pp512         | 2185.94         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_1.5B.llamafile                    | 531.44 MiB | tg16          | 154.59          |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_1.1B.llamafile                    | 408.66 MiB | pp512         | 2692.87         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_1.1B.llamafile                    | 408.66 MiB | tg16          | 173.08          |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_830M.llamafile                    | 301.76 MiB | pp512         | 3353.51         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_830M.llamafile                    | 301.76 MiB | tg16          | 191.98          |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_560M.llamafile                    | 211.21 MiB | pp512         | 4297.08         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_560M.llamafile                    | 211.21 MiB | tg16          | 209.57          |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_390M.llamafile                    | 148.93 MiB | pp512         | 5130.90         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_390M.llamafile                    | 148.93 MiB | tg16          | 221.88          |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_99M.llamafile                     | 148.93 MiB | pp512         | 5127.00         |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_99M.llamafile                     | 148.93 MiB | tg16          | 218.93          |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_190M.llamafile                    | 78.55 MiB  | pp512         | 10874.11        |
| AMD Ryzen Threadripper PRO 7995WX (znver4)     | TriLM\_190M.llamafile                    | 78.55 MiB  | tg16          | 334.45          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_3.9B.llamafile                    | 1.31 GiB   | pp512         | 227.95          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_3.9B.llamafile                    | 1.31 GiB   | tg16          | 65.17           |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_2.4B.llamafile                    | 837.02 MiB | pp512         | 347.93          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_2.4B.llamafile                    | 837.02 MiB | tg16          | 48.26           |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_1.5B.llamafile                    | 531.44 MiB | pp512         | 588.86          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_1.5B.llamafile                    | 531.44 MiB | tg16          | 140.22          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_1.1B.llamafile                    | 408.66 MiB | pp512         | 767.47          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_1.1B.llamafile                    | 408.66 MiB | tg16          | 167.80          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_830M.llamafile                    | 301.76 MiB | pp512         | 1031.20         |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_830M.llamafile                    | 301.76 MiB | tg16          | 204.46          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_560M.llamafile                    | 211.21 MiB | pp512         | 1487.29         |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_560M.llamafile                    | 211.21 MiB | tg16          | 245.53          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_390M.llamafile                    | 148.93 MiB | pp512         | 2049.02         |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_390M.llamafile                    | 148.93 MiB | tg16          | 332.24          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_99M.llamafile                     | 148.93 MiB | pp512         | 2103.34         |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_99M.llamafile                     | 148.93 MiB | tg16          | 301.31          |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_190M.llamafile                    | 78.55 MiB  | pp512         | 4762.49         |
| Apple M2 Ultra (+fp16+dotprod)                 | TriLM\_190M.llamafile                    | 78.55 MiB  | tg16          | 553.83          |
| Intel Core i9-14900K (alderlake)               | TriLM\_3.9B.llamafile                    | 1.31 GiB   | pp512         | 167.15          |
| Intel Core i9-14900K (alderlake)               | TriLM\_3.9B.llamafile                    | 1.31 GiB   | tg16          | 53.22           |
| Intel Core i9-14900K (alderlake)               | TriLM\_2.4B.llamafile                    | 837.02 MiB | pp512         | 261.73          |
| Intel Core i9-14900K (alderlake)               | TriLM\_2.4B.llamafile                    | 837.02 MiB | tg16          | 78.39           |
| Intel Core i9-14900K (alderlake)               | TriLM\_1.5B.llamafile                    | 531.44 MiB | pp512         | 426.17          |
| Intel Core i9-14900K (alderlake)               | TriLM\_1.5B.llamafile                    | 531.44 MiB | tg16          | 123.91          |
| Intel Core i9-14900K (alderlake)               | TriLM\_1.1B.llamafile                    | 408.66 MiB | pp512         | 563.58          |
| Intel Core i9-14900K (alderlake)               | TriLM\_1.1B.llamafile                    | 408.66 MiB | tg16          | 159.13          |
| Intel Core i9-14900K (alderlake)               | TriLM\_830M.llamafile                    | 301.76 MiB | pp512         | 763.27          |
| Intel Core i9-14900K (alderlake)               | TriLM\_830M.llamafile                    | 301.76 MiB | tg16          | 209.42          |
| Intel Core i9-14900K (alderlake)               | TriLM\_560M.llamafile                    | 211.21 MiB | pp512         | 1116.30         |
| Intel Core i9-14900K (alderlake)               | TriLM\_560M.llamafile                    | 211.21 MiB | tg16          | 295.71          |
| Intel Core i9-14900K (alderlake)               | TriLM\_390M.llamafile                    | 148.93 MiB | pp512         | 1586.69         |
| Intel Core i9-14900K (alderlake)               | TriLM\_390M.llamafile                    | 148.93 MiB | tg16          | 377.50          |
| Intel Core i9-14900K (alderlake)               | TriLM\_99M.llamafile                     | 148.93 MiB | pp512         | 1587.38         |
| Intel Core i9-14900K (alderlake)               | TriLM\_99M.llamafile                     | 148.93 MiB | tg16          | 401.37          |
| Intel Core i9-14900K (alderlake)               | TriLM\_190M.llamafile                    | 78.55 MiB  | pp512         | 3713.16         |
| Intel Core i9-14900K (alderlake)               | TriLM\_190M.llamafile                    | 78.55 MiB  | tg16          | 845.54          |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_3.9B.llamafile                    | 1.31 GiB   | pp512         | 17.02           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_3.9B.llamafile                    | 1.31 GiB   | tg16          | 6.67            |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_2.4B.llamafile                    | 837.02 MiB | pp512         | 26.35           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_2.4B.llamafile                    | 837.02 MiB | tg16          | 10.52           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.5B.llamafile                    | 531.44 MiB | pp512         | 42.52           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.5B.llamafile                    | 531.44 MiB | tg16          | 16.91           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.1B.llamafile                    | 408.66 MiB | pp512         | 56.57           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.1B.llamafile                    | 408.66 MiB | tg16          | 20.54           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_390M.llamafile                    | 148.93 MiB | pp512         | 146.67          |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_390M.llamafile                    | 148.93 MiB | tg16          | 56.77           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_99M.llamafile                     | 148.93 MiB | pp512         | 147.65          |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_99M.llamafile                     | 148.93 MiB | tg16          | 58.24           |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_190M.llamafile                    | 78.55 MiB  | pp512         | 338.42          |
| Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_190M.llamafile                    | 78.55 MiB  | tg16          | 107.33          |

## About llamafile

llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp
binaries that run on the stock installs of six OSes for both ARM64 and
AMD64.

---

# TriLM 3.9B Unpacked

TriLM (ternary model), unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, TriLM has the same architecture as LLaMa.

```python
import transformers as tf, torch
model_name = "SpectraSuite/TriLM_3.9B_Unpacked"

# Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs.
pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto")

# These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly.
pipeline("Once upon a time")
```

* License: Apache 2.0
* We will use our GitHub repo for communication (including HF repo related queries). Feel free to open an issue here https://github.com/NolanoOrg/SpectraSuite