Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ language:
|
|
6 |
- ja
|
7 |
- en
|
8 |
---
|
9 |
-
# shisa-v1-qwen2-7b-gguf
|
10 |
[shisa-aiさんが公開しているshisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b)のggufフォーマット変換版です。
|
11 |
|
12 |
# Notice
|
@@ -20,4 +20,20 @@ language:
|
|
20 |
2. 以下のようなコマンドでFlashAttentionを有効化して実行します:
|
21 |
```
|
22 |
./server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
|
23 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- ja
|
7 |
- en
|
8 |
---
|
9 |
+
# shisa-v1-qwen2-7b-gguf (English explanation is below.)
|
10 |
[shisa-aiさんが公開しているshisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b)のggufフォーマット変換版です。
|
11 |
|
12 |
# Notice
|
|
|
20 |
2. 以下のようなコマンドでFlashAttentionを有効化して実行します:
|
21 |
```
|
22 |
./server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
|
23 |
+
```
|
24 |
+
|
25 |
+
# shisa-v1-qwen2-7b-gguf
|
26 |
+
This is a gguf format conversion of [shisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b) published by shisa-ai.
|
27 |
+
|
28 |
+
# Notice
|
29 |
+
* Currently, there is a bug where the output gets corrupted when trying to run models based on the qwen2-7B series in GGUF format. This can be avoided by enabling Flash Attention.
|
30 |
+
* If using LMStudio, please enable Flash Attention from the Preset.
|
31 |
+
* If using Llama.cpp, please follow these steps:
|
32 |
+
1. Build with the following command:
|
33 |
+
```
|
34 |
+
make LLAMA_CUDA_FA_ALL_QUANTS=true && LLAMA_CUDA=1
|
35 |
+
```
|
36 |
+
2. Run with Flash Attention enabled using a command like this:
|
37 |
+
```
|
38 |
+
./server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
|
39 |
+
```
|