jartine commited on
Commit
19d64c4
1 Parent(s): ddfe71b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ model_creator: SpectraSuite
5
+ quantized_by: jartine
6
+ pipeline_tag: text-generation
7
+ license: apache-2.0
8
+ license_link: LICENSE
9
+ tags:
10
+ - llamafile
11
+ ---
12
+
13
+ # TriLM - llamafile
14
+
15
+ This is a ternary LLM whose weights consist of {-1, 0, +1}. It's highly
16
+ optimized for CPU performance, thanks to the [`Q2_K_S` quantization
17
+ format](https://github.com/Mozilla-Ocho/llamafile/pull/552).
18
+
19
+ - Model creator: [SpectraSuite](https://huggingface.co/SpectraSuite)
20
+ - Original model: [TriLMs-Unpacked](https://huggingface.co/collections/SpectraSuite/trilms-unpacked-668d5f62afe0f4036925b1d2)
21
+
22
+ This repository packages and distributes TriLM as executable weights,
23
+ which we call [llamafiles](https://github.com/Mozilla-Ocho/llamafile).
24
+ The files you download here will run on Linux, MacOS, Windows, FreeBSD,
25
+ OpenBSD, and NetBSD for AMD64 and ARM64.
26
+
27
+ ## Quickstart
28
+
29
+ Running the following on a desktop OS will launch a tab in your web
30
+ browser with a completions interface.
31
+
32
+ ```
33
+ wget https://huggingface.co/Mozilla/TriLM-llamafile/resolve/main/TriLM_3.9B.llamafile
34
+ chmod +x TriLM_3.9B.llamafile
35
+ ./TriLM_3.9B.llamafile
36
+ ```
37
+
38
+ For further information, please see the [llamafile
39
+ README](https://github.com/mozilla-ocho/llamafile/).
40
+
41
+ Having **trouble?** See the ["Gotchas"
42
+ section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
43
+ of the README.
44
+
45
+ ## Prompting
46
+
47
+ This is a base model. It hasn't been fine-tuned for chat. It's
48
+ recommended that the completions interface be used.
49
+
50
+ It's recommended with the smaller TriLM models (e.g. 99M) that a high
51
+ repeat penalty be set, e.g. `--repeat-penalty 10`.
52
+
53
+ ## Benchmarks
54
+
55
+ | cpu\_info | model\_filename | size | test | t/s |
56
+ | -----------------------------------------: | ---------------------------------------: | ---------: | ------------: | --------------: |
57
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 1069.54 |
58
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 88.47 |
59
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 1441.04 |
60
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 110.80 |
61
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 2185.94 |
62
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 154.59 |
63
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 2692.87 |
64
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 173.08 |
65
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_830M.llamafile | 301.76 MiB | pp512 | 3353.51 |
66
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_830M.llamafile | 301.76 MiB | tg16 | 191.98 |
67
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_560M.llamafile | 211.21 MiB | pp512 | 4297.08 |
68
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_560M.llamafile | 211.21 MiB | tg16 | 209.57 |
69
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 5130.90 |
70
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 221.88 |
71
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 5127.00 |
72
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 218.93 |
73
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 10874.11 |
74
+ | AMD Ryzen Threadripper PRO 7995WX (znver4) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 334.45 |
75
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 227.95 |
76
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 65.17 |
77
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 347.93 |
78
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 48.26 |
79
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 588.86 |
80
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 140.22 |
81
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 767.47 |
82
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 167.80 |
83
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_830M.llamafile | 301.76 MiB | pp512 | 1031.20 |
84
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_830M.llamafile | 301.76 MiB | tg16 | 204.46 |
85
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_560M.llamafile | 211.21 MiB | pp512 | 1487.29 |
86
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_560M.llamafile | 211.21 MiB | tg16 | 245.53 |
87
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 2049.02 |
88
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 332.24 |
89
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 2103.34 |
90
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 301.31 |
91
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 4762.49 |
92
+ | Apple M2 Ultra (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 553.83 |
93
+ | Intel Core i9-14900K (alderlake) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 167.15 |
94
+ | Intel Core i9-14900K (alderlake) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 53.22 |
95
+ | Intel Core i9-14900K (alderlake) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 261.73 |
96
+ | Intel Core i9-14900K (alderlake) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 78.39 |
97
+ | Intel Core i9-14900K (alderlake) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 426.17 |
98
+ | Intel Core i9-14900K (alderlake) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 123.91 |
99
+ | Intel Core i9-14900K (alderlake) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 563.58 |
100
+ | Intel Core i9-14900K (alderlake) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 159.13 |
101
+ | Intel Core i9-14900K (alderlake) | TriLM\_830M.llamafile | 301.76 MiB | pp512 | 763.27 |
102
+ | Intel Core i9-14900K (alderlake) | TriLM\_830M.llamafile | 301.76 MiB | tg16 | 209.42 |
103
+ | Intel Core i9-14900K (alderlake) | TriLM\_560M.llamafile | 211.21 MiB | pp512 | 1116.30 |
104
+ | Intel Core i9-14900K (alderlake) | TriLM\_560M.llamafile | 211.21 MiB | tg16 | 295.71 |
105
+ | Intel Core i9-14900K (alderlake) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 1586.69 |
106
+ | Intel Core i9-14900K (alderlake) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 377.50 |
107
+ | Intel Core i9-14900K (alderlake) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 1587.38 |
108
+ | Intel Core i9-14900K (alderlake) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 401.37 |
109
+ | Intel Core i9-14900K (alderlake) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 3713.16 |
110
+ | Intel Core i9-14900K (alderlake) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 845.54 |
111
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | pp512 | 17.02 |
112
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_3.9B.llamafile | 1.31 GiB | tg16 | 6.67 |
113
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | pp512 | 26.35 |
114
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_2.4B.llamafile | 837.02 MiB | tg16 | 10.52 |
115
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | pp512 | 42.52 |
116
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.5B.llamafile | 531.44 MiB | tg16 | 16.91 |
117
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | pp512 | 56.57 |
118
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_1.1B.llamafile | 408.66 MiB | tg16 | 20.54 |
119
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | pp512 | 146.67 |
120
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_390M.llamafile | 148.93 MiB | tg16 | 56.77 |
121
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | pp512 | 147.65 |
122
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_99M.llamafile | 148.93 MiB | tg16 | 58.24 |
123
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | pp512 | 338.42 |
124
+ | Raspberry Pi 5 Model B Rev 1.0 (+fp16+dotprod) | TriLM\_190M.llamafile | 78.55 MiB | tg16 | 107.33 |
125
+
126
+ ## About llamafile
127
+
128
+ llamafile is a new format introduced by Mozilla Ocho on Nov 20th 2023.
129
+ It uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp
130
+ binaries that run on the stock installs of six OSes for both ARM64 and
131
+ AMD64.
132
+
133
+ ---
134
+
135
+ # TriLM 3.9B Unpacked
136
+
137
+ TriLM (ternary model), unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, TriLM has the same architecture as LLaMa.
138
+
139
+ ```python
140
+ import transformers as tf, torch
141
+ model_name = "SpectraSuite/TriLM_3.9B_Unpacked"
142
+
143
+ # Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs.
144
+ pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto")
145
+
146
+ # These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly.
147
+ pipeline("Once upon a time")
148
+ ```
149
+
150
+ * License: Apache 2.0
151
+ * We will use our GitHub repo for communication (including HF repo related queries). Feel free to open an issue here https://github.com/NolanoOrg/SpectraSuite