keitokei1994 commited on
Commit
250db7a
·
verified ·
1 Parent(s): 2148ff1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -2
README.md CHANGED
@@ -6,7 +6,7 @@ language:
6
  - ja
7
  - en
8
  ---
9
- # shisa-v1-qwen2-7b-gguf
10
  [shisa-aiさんが公開しているshisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b)のggufフォーマット変換版です。
11
 
12
  # Notice
@@ -20,4 +20,20 @@ language:
20
  2. 以下のようなコマンドでFlashAttentionを有効化して実行します:
21
  ```
22
  ./server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
23
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - ja
7
  - en
8
  ---
9
+ # shisa-v1-qwen2-7b-gguf (English explanation is below.)
10
  [shisa-aiさんが公開しているshisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b)のggufフォーマット変換版です。
11
 
12
  # Notice
 
20
  2. 以下のようなコマンドでFlashAttentionを有効化して実行します:
21
  ```
22
  ./server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
23
+ ```
24
+
25
+ # shisa-v1-qwen2-7b-gguf
26
+ This is a gguf format conversion of [shisa-v1-qwen2-7b](https://huggingface.co/shisa-ai/shisa-v1-qwen2-7b) published by shisa-ai.
27
+
28
+ # Notice
29
+ * Currently, there is a bug where the output gets corrupted when trying to run models based on the qwen2-7B series in GGUF format. This can be avoided by enabling Flash Attention.
30
+ * If using LMStudio, please enable Flash Attention from the Preset.
31
+ * If using Llama.cpp, please follow these steps:
32
+ 1. Build with the following command:
33
+ ```
34
+ make LLAMA_CUDA_FA_ALL_QUANTS=true && LLAMA_CUDA=1
35
+ ```
36
+ 2. Run with Flash Attention enabled using a command like this:
37
+ ```
38
+ ./server -m ./models/shisa-v1-qwen2-7b.Q8_0.gguf -ngl 99 --port 8888 -fa
39
+ ```