hierholzer
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,10 @@ tags:
|
|
8 |
- llama-3.1
|
9 |
- llama-3.1-instruct
|
10 |
- gguf
|
|
|
|
|
|
|
|
|
11 |
model_name: Llama-3.1-70B-Instruct-GGUF
|
12 |
arxiv: 2407.21783
|
13 |
base_model: meta-llama/Llama-3.1-70b-instruct.hf
|
@@ -62,11 +66,11 @@ Here are the quantized versions that I have available:
|
|
62 |
- [ ] F16 ~ *NOT Recommended*
|
63 |
- [ ] F32 ~ *NOT Recommended*
|
64 |
|
65 |
-
Feel Free to reach out to me if you need a specific Quantization Type that I do not currently offer
|
66 |
|
67 |
|
68 |
### 📈All Quantization Types Possible
|
69 |
-
Below is a table of all the
|
70 |
|
71 |
| **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_ |
|
72 |
|-------|:------:|:------:|:-----:|----------------------------------------------------------------|
|
@@ -128,10 +132,12 @@ git clone https://github.com/oobabooga/text-generation-webui.git
|
|
128 |
| 4. | Once the download is finished, click the blue refresh icon within the Model tab that you are in. |
|
129 |
| 5. | Select your newly downloaded GGUF file in the Model drop-down. once selected, change the settings to best match your system. |
|
130 |
|
|
|
131 |
### 2️⃣ Ollama
|
132 |
Ollama runs as a local service.
|
133 |
Although it technically works using a command-line interface, Ollama's best attribute is their REST API.
|
134 |
Being able to utilize your locally ran LLMs through the use of this API can give you almost endless possibilities!
|
|
|
135 |
*Feel free to reach out to me if you would like to know some examples that I use this API for*
|
136 |
|
137 |
#### ☑️ How to install Ollama
|
@@ -141,49 +147,29 @@ https://ollama.com/download
|
|
141 |
```
|
142 |
Using Windows, or Mac you will then download a file and run it.
|
143 |
If you are using linux it will just provide a single command that you need to run in your terminal window.
|
144 |
-
*
|
145 |
#### ✅Using Llama-3.1-70B-Instruct-GGUF with Ollama
|
146 |
Ollama does have a Model Library where you can download models:
|
147 |
```shell
|
148 |
https://ollama.com/library
|
149 |
```
|
150 |
-
This Model Library offers
|
151 |
-
However
|
152 |
-
|
|
|
|
|
153 |
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama |
|
154 |
|----|----------------------------------------------------------------------------------------------|
|
155 |
-
| 1. |
|
156 |
-
| 2. |
|
157 |
```shell
|
158 |
-
|
159 |
-
|
160 |
-
PARAMETER stop "<|im_start|>"
|
161 |
-
PARAMETER stop "<|im_end|>"
|
162 |
-
TEMPLATE """
|
163 |
-
<|im_start|>system
|
164 |
-
<|im_end|>
|
165 |
-
<|im_start|>user
|
166 |
-
<|im_end|>
|
167 |
-
<|im_start|>assistant
|
168 |
-
"""
|
169 |
-
```
|
170 |
-
*Replace ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the correct version and actual path to the GGUF file you downloaded.
|
171 |
-
The TEMPLATE line defines the prompt format using system, user, and assistant roles.
|
172 |
-
You can customize this based on your use case.*
|
173 |
-
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
|
174 |
-
|----|-----------------------------------------------------------------------------------|
|
175 |
-
| 3. | Now, build the Ollama model using the ollama create command: |
|
176 |
-
```shell
|
177 |
-
ollama create "Llama-3.1-70B-Instruct-Q4_K_M" -f ./Llama-3.1-70B-Instruct-Q4_K_M.gguf
|
178 |
```
|
179 |
-
*
|
180 |
-
model: ./Llama-3.1-70B-Instruct-Q4_K_M.gguf with the quantized model you are using.*
|
181 |
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
|
182 |
|----|-----------------------------------------------------------------------------------|
|
183 |
-
|
|
184 |
-
```shell
|
185 |
-
ollama run Llama-3.1-70B-Instruct-Q4_K_M
|
186 |
-
```
|
187 |
|
188 |
-------------------------------------------------
|
189 |
|
|
|
8 |
- llama-3.1
|
9 |
- llama-3.1-instruct
|
10 |
- gguf
|
11 |
+
- ollama
|
12 |
+
- Text-generation-webui
|
13 |
+
- instruct
|
14 |
+
- llama_3.1
|
15 |
model_name: Llama-3.1-70B-Instruct-GGUF
|
16 |
arxiv: 2407.21783
|
17 |
base_model: meta-llama/Llama-3.1-70b-instruct.hf
|
|
|
66 |
- [ ] F16 ~ *NOT Recommended*
|
67 |
- [ ] F32 ~ *NOT Recommended*
|
68 |
|
69 |
+
*Feel Free to reach out to me if you need a specific Quantization Type that I do not currently offer.*
|
70 |
|
71 |
|
72 |
### 📈All Quantization Types Possible
|
73 |
+
Below is a table of all the Quantization Types that are possible as well as short descriptions.
|
74 |
|
75 |
| **#** | **or** | **Q#** | **:** | _Description Of Quantization Types_ |
|
76 |
|-------|:------:|:------:|:-----:|----------------------------------------------------------------|
|
|
|
132 |
| 4. | Once the download is finished, click the blue refresh icon within the Model tab that you are in. |
|
133 |
| 5. | Select your newly downloaded GGUF file in the Model drop-down. once selected, change the settings to best match your system. |
|
134 |
|
135 |
+
|
136 |
### 2️⃣ Ollama
|
137 |
Ollama runs as a local service.
|
138 |
Although it technically works using a command-line interface, Ollama's best attribute is their REST API.
|
139 |
Being able to utilize your locally ran LLMs through the use of this API can give you almost endless possibilities!
|
140 |
+
|
141 |
*Feel free to reach out to me if you would like to know some examples that I use this API for*
|
142 |
|
143 |
#### ☑️ How to install Ollama
|
|
|
147 |
```
|
148 |
Using Windows, or Mac you will then download a file and run it.
|
149 |
If you are using linux it will just provide a single command that you need to run in your terminal window.
|
150 |
+
*That's about it for installing Ollama*
|
151 |
#### ✅Using Llama-3.1-70B-Instruct-GGUF with Ollama
|
152 |
Ollama does have a Model Library where you can download models:
|
153 |
```shell
|
154 |
https://ollama.com/library
|
155 |
```
|
156 |
+
This Model Library offers many different LLM versions that you can use.
|
157 |
+
However at the time of writing this, there is no version of Llama-3.1-Instruct offered in the Ollama library.
|
158 |
+
|
159 |
+
If you would like to use Llama-3.1-Instruct (70B), do the following:
|
160 |
+
|
161 |
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama |
|
162 |
|----|----------------------------------------------------------------------------------------------|
|
163 |
+
| 1. | Open up your terminal that you have Ollama Installed on. |
|
164 |
+
| 2. | Paste the following command: |
|
165 |
```shell
|
166 |
+
ollama run hf.co/hierholzer/Llama-3.1-70B-Instruct-GGUF:Q4_K_M
|
167 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
```
|
169 |
+
*Replace Q4_K_M with whatever version you would like to use from this repository.*
|
|
|
170 |
| # | Running the 70B quantized version of Llama 3.1-Instruct with Ollama - *continued* |
|
171 |
|----|-----------------------------------------------------------------------------------|
|
172 |
+
| 3. | This will download & run the model. It will also be saved for future use. |
|
|
|
|
|
|
|
173 |
|
174 |
-------------------------------------------------
|
175 |
|