Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,18 @@ Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama
|
|
19 |
|
20 |
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
## Use with llama.cpp
|
23 |
Install llama.cpp through brew (works on Mac and Linux)
|
24 |
|
|
|
19 |
|
20 |
For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd).
|
21 |
|
22 |
+
## Quantization
|
23 |
+
We performed quantization using [llama.cpp](https://github.com/ggerganov/llama.cpp) and converted the model to GGUF format. Currently, we only offer quantized models in the Q4_K_M format.
|
24 |
+
|
25 |
+
We have prepared two quantized model options, GGUF and AWQ. Here is the table measuring the performance degradation due to quantization.
|
26 |
+
|
27 |
+
| Model | ELYZA-tasks-100 GPT4 score |
|
28 |
+
| :-------------------------------- | ---: |
|
29 |
+
| Llama-3-ELYZA-JP-8B | 3.655 |
|
30 |
+
| Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M) | 3.57 |
|
31 |
+
| Llama-3-ELYZA-JP-8B-AWQ | 3.39 |
|
32 |
+
|
33 |
+
|
34 |
## Use with llama.cpp
|
35 |
Install llama.cpp through brew (works on Mac and Linux)
|
36 |
|