Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,13 @@ tags:
|
|
6 |
|
7 |
## 更新履歴 update history
|
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
2024/07/20
|
10 |
llama.cppに不具合[llama : fix pre-tokenization of non-special added tokens #8228](https://github.com/ggerganov/llama.cpp/pull/8228)が見つかり、Gemma2モデルは再変換が必要になり対応しました。HTMLタグの処理などが不正確になっていたとの事です。
|
11 |
A bug was found in llama.cpp [llama: fix pre-tokenization of non-special added tokens #8228](https://github.com/ggerganov/llama.cpp/pull/8228), and the Gemma2 model needed to be reconverted. The problem was that HTML tags were not being processed correctly.
|
@@ -14,13 +21,13 @@ A bug was found in llama.cpp [llama: fix pre-tokenization of non-special added t
|
|
14 |
Simply reconverting it was not interesting, so I tried converting the output tensor and embedding to f16, which is said to have even greater accuracy in versions of 4 bits or more.
|
15 |
念の為、4bit版は従来の変換とf16タイプの変換の両方をアップロードしてあります。
|
16 |
Just to be on the safe side, I have uploaded both the 4-bit conventional conversion and the f16 conversion.
|
17 |
-
|
18 |
|
19 |
## 本ggufモデルについて about this gguf model
|
20 |
|
21 |
-
gemma-2-27b-itを日本語が多く含まれる重要度行列(iMatrix)を使って量子化したgguf
|
22 |
This is a quantized gguf version of gemma-2-27b-it using an importance matrix (iMatrix) that contains many Japanese words.
|
23 |
-
I hope it retains more Japanese support
|
24 |
|
25 |
gemma-2-27b-it-Q4_K_M.ggufは最近のCPU(Ryzen 9 7940HS Processor)であれば3トークン/秒程度の速度で実行する事が確認できています。
|
26 |
It has been confirmed that gemma-2-27b-it-Q4_K_M.gguf runs at about 3 tokens/second on a recent CPU (Ryzen 9 7940HS Processor).
|
|
|
6 |
|
7 |
## 更新履歴 update history
|
8 |
|
9 |
+
2024/09/24
|
10 |
+
- 8月8日にgemma-2-9b-itのToeknizerが更新されていたので作り直し(連続するタブの処理などわずかな変更が発生しているようです)
|
11 |
+
- CPUでBF16化処理を実施(特定の場面で微妙に性能が向上しているかもしれません)
|
12 |
+
- iMatrixファイルに日本語データを更に追加([imatrix-jpn-test](https://huggingface.co/dahara1/imatrix-jpn-test)で検証を実施)
|
13 |
+
|
14 |
+
<details>
|
15 |
+
<summary>過去の更新履歴</summary>
|
16 |
2024/07/20
|
17 |
llama.cppに不具合[llama : fix pre-tokenization of non-special added tokens #8228](https://github.com/ggerganov/llama.cpp/pull/8228)が見つかり、Gemma2モデルは再変換が必要になり対応しました。HTMLタグの処理などが不正確になっていたとの事です。
|
18 |
A bug was found in llama.cpp [llama: fix pre-tokenization of non-special added tokens #8228](https://github.com/ggerganov/llama.cpp/pull/8228), and the Gemma2 model needed to be reconverted. The problem was that HTML tags were not being processed correctly.
|
|
|
21 |
Simply reconverting it was not interesting, so I tried converting the output tensor and embedding to f16, which is said to have even greater accuracy in versions of 4 bits or more.
|
22 |
念の為、4bit版は従来の変換とf16タイプの変換の両方をアップロードしてあります。
|
23 |
Just to be on the safe side, I have uploaded both the 4-bit conventional conversion and the f16 conversion.
|
24 |
+
</details>
|
25 |
|
26 |
## 本ggufモデルについて about this gguf model
|
27 |
|
28 |
+
gemma-2-27b-itを日本語が多く含まれる重要度行列(iMatrix)を使って量子化したgguf版です。日本語対応能力が多めに保持されている事を期待しています。
|
29 |
This is a quantized gguf version of gemma-2-27b-it using an importance matrix (iMatrix) that contains many Japanese words.
|
30 |
+
I hope it retains more Japanese support.
|
31 |
|
32 |
gemma-2-27b-it-Q4_K_M.ggufは最近のCPU(Ryzen 9 7940HS Processor)であれば3トークン/秒程度の速度で実行する事が確認できています。
|
33 |
It has been confirmed that gemma-2-27b-it-Q4_K_M.gguf runs at about 3 tokens/second on a recent CPU (Ryzen 9 7940HS Processor).
|