Update README.md
Browse files
README.md
CHANGED
@@ -28,23 +28,13 @@ The model is packaged into executable weights, which we call
|
|
28 |
easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
|
29 |
NetBSD for AMD64 and ARM64.
|
30 |
|
31 |
-
|
32 |
-
|
33 |
-
The llamafile software is open source and permissively licensed. However
|
34 |
-
the weights embedded inside the llamafiles are governed by Google's
|
35 |
-
Gemma License and Gemma Prohibited Use Policy. This is not an open
|
36 |
-
source license. It's about as restrictive as it gets. There's a great
|
37 |
-
many things you're not allowed to do with Gemma. The terms of the
|
38 |
-
license and its list of unacceptable uses can be changed by Google at
|
39 |
-
any time. Therefore we wouldn't recommend using these llamafiles for
|
40 |
-
anything other than evaluating the quality of Google's engineering.
|
41 |
-
|
42 |
-
See the [LICENSE](LICENSE) file for further details.
|
43 |
|
44 |
## Quickstart
|
45 |
|
46 |
-
|
47 |
-
|
|
|
48 |
|
49 |
```
|
50 |
wget https://huggingface.co/jartine/gemma-2-2b-it-llamafile/resolve/main/gemma-2-2b-it.Q6_K.llamafile
|
@@ -52,56 +42,95 @@ chmod +x gemma-2-2b-it.Q6_K.llamafile
|
|
52 |
./gemma-2-2b-it.Q6_K.llamafile
|
53 |
```
|
54 |
|
55 |
-
|
|
|
56 |
|
57 |
-
|
58 |
-
context window size of 512 tokens is used. You may increase this to the
|
59 |
-
maximum by passing the `-c 0` flag.
|
60 |
-
|
61 |
-
On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
|
62 |
-
the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
|
63 |
-
driver needs to be installed. If the prebuilt DSOs should fail, the CUDA
|
64 |
-
or ROCm SDKs may need to be installed, in which case llamafile builds a
|
65 |
-
native module just for your system.
|
66 |
-
|
67 |
-
For further information, please see the [llamafile
|
68 |
-
README](https://github.com/mozilla-ocho/llamafile/).
|
69 |
|
70 |
Having **trouble?** See the ["Gotchas"
|
71 |
-
section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas)
|
72 |
of the README.
|
73 |
|
74 |
-
##
|
75 |
|
76 |
-
|
|
|
|
|
|
|
|
|
77 |
|
78 |
-
|
|
|
79 |
|
80 |
```
|
81 |
-
|
82 |
-
<start_of_turn>{{char}}
|
83 |
```
|
84 |
|
85 |
-
|
86 |
|
87 |
```
|
88 |
-
|
89 |
-
{{message}}<end_of_turn>
|
90 |
```
|
91 |
|
92 |
-
|
93 |
|
94 |
```
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
```
|
100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
101 |
## About llamafile
|
102 |
|
103 |
-
llamafile is a new format introduced by Mozilla
|
104 |
-
|
105 |
binaries that run on the stock installs of six OSes for both ARM64 and
|
106 |
AMD64.
|
107 |
|
@@ -127,6 +156,19 @@ which require more memory.
|
|
127 |
The 9B and 27B models were released a month earlier than 2B, so they're
|
128 |
packaged with an slightly older version of the llamafile software.
|
129 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
130 |
---
|
131 |
|
132 |
# Gemma 2 model card
|
|
|
28 |
easy to use the model on Linux, MacOS, Windows, FreeBSD, OpenBSD, and
|
29 |
NetBSD for AMD64 and ARM64.
|
30 |
|
31 |
+
*Software Last Updated: 2024-10-30*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
## Quickstart
|
34 |
|
35 |
+
To get started, you need both the Gemma weights, and the llamafile
|
36 |
+
software. Both of them are included in a single file, which can be
|
37 |
+
downloaded and run as follows:
|
38 |
|
39 |
```
|
40 |
wget https://huggingface.co/jartine/gemma-2-2b-it-llamafile/resolve/main/gemma-2-2b-it.Q6_K.llamafile
|
|
|
42 |
./gemma-2-2b-it.Q6_K.llamafile
|
43 |
```
|
44 |
|
45 |
+
The default mode of operation for these llamafiles is our new command
|
46 |
+
line chatbot interface. It looks like this:
|
47 |
|
48 |
+
![Screenshot of Gemma 2b llamafile on MacOS](llamafile-gemma.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
Having **trouble?** See the ["Gotchas"
|
51 |
+
section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
|
52 |
of the README.
|
53 |
|
54 |
+
## Usage
|
55 |
|
56 |
+
By default, llamafile launches a chatbot in the terminal, and a server
|
57 |
+
in the background. The chatbot is mostly self-explanatory. You can type
|
58 |
+
`/help` for further details. See the [llamafile v0.8.15 release
|
59 |
+
notes](https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.15)
|
60 |
+
for documentation on our newest chatbot features.
|
61 |
|
62 |
+
To instruct Gemma to do role playing, you can customize the system
|
63 |
+
prompt of the chatbot as follows:
|
64 |
|
65 |
```
|
66 |
+
./gemma-2-2b-it.Q6_K.llamafile --chat -p "you are the ghost of edgar allen poe"
|
|
|
67 |
```
|
68 |
|
69 |
+
To view the man page, run:
|
70 |
|
71 |
```
|
72 |
+
./gemma-2-2b-it.Q6_K.llamafile --help
|
|
|
73 |
```
|
74 |
|
75 |
+
To send a request to the OpenAI API compatible llamafile server, try:
|
76 |
|
77 |
```
|
78 |
+
curl http://localhost:8080/v1/chat/completions \
|
79 |
+
-H "Content-Type: application/json" \
|
80 |
+
-d '{
|
81 |
+
"model": "gemma-2b-it",
|
82 |
+
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
83 |
+
"temperature": 0.0
|
84 |
+
}'
|
85 |
+
```
|
86 |
+
|
87 |
+
If you don't want the chatbot and you only want to run the server:
|
88 |
+
|
89 |
+
```
|
90 |
+
./gemma-2-2b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
|
91 |
+
```
|
92 |
+
|
93 |
+
An advanced CLI mode is provided that's useful for shell scripting. You
|
94 |
+
can use it by passing the `--cli` flag. For additional help on how it
|
95 |
+
may be used, pass the `--help` flag.
|
96 |
+
|
97 |
+
```
|
98 |
+
./gemma-2-2b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
|
99 |
```
|
100 |
|
101 |
+
You then need to fill out the prompt / history template (see below).
|
102 |
+
|
103 |
+
For further information, please see the [llamafile
|
104 |
+
README](https://github.com/mozilla-ocho/llamafile/).
|
105 |
+
|
106 |
+
## Context Window
|
107 |
+
|
108 |
+
This model has a max context window size of 8k tokens. By default, a
|
109 |
+
context window size of 8192 tokens is used. You may limit the context
|
110 |
+
window size by passing the `-c N` flag.
|
111 |
+
|
112 |
+
## GPU Acceleration
|
113 |
+
|
114 |
+
On GPUs with sufficient RAM, the `-ngl 999` flag may be passed to use
|
115 |
+
the system's NVIDIA or AMD GPU(s). On Windows, only the graphics card
|
116 |
+
driver needs to be installed if you own an NVIDIA GPU. On Windows, if
|
117 |
+
you have an AMD GPU, you should install the ROCm SDK v6.1 and then pass
|
118 |
+
the flags `--recompile --gpu amd` the first time you run your llamafile.
|
119 |
+
|
120 |
+
On NVIDIA GPUs, by default, the prebuilt tinyBLAS library is used to
|
121 |
+
perform matrix multiplications. This is open source software, but it
|
122 |
+
doesn't go as fast as closed source cuBLAS. If you have the CUDA SDK
|
123 |
+
installed on your system, then you can pass the `--recompile` flag to
|
124 |
+
build a GGML CUDA library just for your system that uses cuBLAS. This
|
125 |
+
ensures you get maximum performance.
|
126 |
+
|
127 |
+
For further information, please see the [llamafile
|
128 |
+
README](https://github.com/mozilla-ocho/llamafile/).
|
129 |
+
|
130 |
## About llamafile
|
131 |
|
132 |
+
llamafile is a new format introduced by Mozilla on Nov 20th 2023. It
|
133 |
+
uses Cosmopolitan Libc to turn LLM weights into runnable llama.cpp
|
134 |
binaries that run on the stock installs of six OSes for both ARM64 and
|
135 |
AMD64.
|
136 |
|
|
|
156 |
The 9B and 27B models were released a month earlier than 2B, so they're
|
157 |
packaged with an slightly older version of the llamafile software.
|
158 |
|
159 |
+
## License
|
160 |
+
|
161 |
+
The llamafile software is open source and permissively licensed. However
|
162 |
+
the weights embedded inside the llamafiles are governed by Google's
|
163 |
+
Gemma License and Gemma Prohibited Use Policy. This is not an open
|
164 |
+
source license. It's about as restrictive as it gets. There's a great
|
165 |
+
many things you're not allowed to do with Gemma. The terms of the
|
166 |
+
license and its list of unacceptable uses can be changed by Google at
|
167 |
+
any time. Therefore we wouldn't recommend using these llamafiles for
|
168 |
+
anything other than evaluating the quality of Google's engineering.
|
169 |
+
|
170 |
+
See the [LICENSE](LICENSE) file for further details.
|
171 |
+
|
172 |
---
|
173 |
|
174 |
# Gemma 2 model card
|