littlebird13
commited on
Commit
•
c62434d
1
Parent(s):
c575992
Update README.md
Browse files
README.md
CHANGED
@@ -34,14 +34,14 @@ In the following demonstration, we assume that you are running commands under th
|
|
34 |
## How to use
|
35 |
Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
|
36 |
```shell
|
37 |
-
huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-
|
38 |
```
|
39 |
|
40 |
To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (the previous `server`).
|
41 |
We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
|
42 |
|
43 |
```bash
|
44 |
-
./llama-server -m qwen2-
|
45 |
```
|
46 |
|
47 |
(Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
|
@@ -69,7 +69,7 @@ print(completion.choices[0].message.content)
|
|
69 |
If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
|
70 |
|
71 |
```bash
|
72 |
-
./llama-cli -m qwen2-
|
73 |
-n 512 -co -i -if -f prompts/chat-with-qwen.txt \
|
74 |
--in-prefix "<|im_start|>user\n" \
|
75 |
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
|
|
|
34 |
## How to use
|
35 |
Cloning the repo may be inefficient, and thus you can manually download the GGUF file that you need or use `huggingface-cli` (`pip install huggingface_hub`) as shown below:
|
36 |
```shell
|
37 |
+
huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-1_5b-instruct-q5_k_m.gguf --local-dir . --local-dir-use-symlinks False
|
38 |
```
|
39 |
|
40 |
To run Qwen2, you can use `llama-cli` (the previous `main`) or `llama-server` (the previous `server`).
|
41 |
We recommend using the `llama-server` as it is simple and compatible with OpenAI API. For example:
|
42 |
|
43 |
```bash
|
44 |
+
./llama-server -m qwen2-1_5b-instruct-q5_k_m.gguf -ngl 28 -fa
|
45 |
```
|
46 |
|
47 |
(Note: `-ngl 28` refers to offloading 28 layers to GPUs, and `-fa` refers to the use of flash attention.)
|
|
|
69 |
If you choose to use `llama-cli`, pay attention to the removal of `-cml` for the ChatML template. Instead you should use `--in-prefix` and `--in-suffix` to tackle this problem.
|
70 |
|
71 |
```bash
|
72 |
+
./llama-cli -m qwen2-1_5b-instruct-q5_k_m.gguf \
|
73 |
-n 512 -co -i -if -f prompts/chat-with-qwen.txt \
|
74 |
--in-prefix "<|im_start|>user\n" \
|
75 |
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
|