carecodeconnect
commited on
Commit
•
d589b8b
1
Parent(s):
4a64513
Update README.md
Browse files
README.md
CHANGED
@@ -50,6 +50,14 @@ Training resulted in a model capable of generating coherent and contextually rel
|
|
50 |
- Datasets: 2.18.0
|
51 |
- Tokenizers: 0.15.2
|
52 |
|
53 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
-
The model was fine-tuned using the Axolotl toolkit, with specific emphasis on low-resource environments. Key aspects of the fine-tuning process include utilizing QLoRA for efficient learning and adapting to the guided meditation domain, employing mixed precision training for enhanced performance, and custom tokenization to fit the unique structure of meditation scripts. The entire process emphasizes resource efficiency and model effectiveness in generating serene and contextually appropriate meditation guides.
|
|
|
50 |
- Datasets: 2.18.0
|
51 |
- Tokenizers: 0.15.2
|
52 |
|
53 |
+
## Quantization with llama.cpp
|
54 |
+
|
55 |
+
The model was quantized to enhance its efficiency and reduce its size, making it more suitable for deployment in various environments, including those with limited resources. The quantization process was performed using `llama.cpp`, following the steps outlined by Maxime Labonne in [Quantize Llama models with GGUF and llama.cpp](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html).
|
56 |
+
|
57 |
+
The process involved:
|
58 |
+
- Cloning the `llama.cpp` repository and setting it up with the required dependencies.
|
59 |
+
- Downloading the model to be quantized.
|
60 |
+
- Using the `llama.cpp/convert.py` script to convert the model to fp16 format, followed by quantization, significantly reducing the model's size while retaining its performance capabilities.
|
61 |
+
|
62 |
+
The quantization resulted in a compressed model with a significant reduction in size from 13813.02 MB to 4892.99 MB, enhancing its loading and inference speeds without compromising on the generation quality.
|
63 |
|
|