|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- gguf |
|
- 4bit |
|
--- |
|
|
|
|
|
This repo provides the GGUF format for the [aksara_v1](https://huggingface.co/cropinailab/aksara_v1) model. This model has a precision of 4-bit and is capable of doing inference with GPU as well as CPU only. |
|
|
|
## To run using Python: |
|
|
|
1. **Install llama-cpp-python:** |
|
|
|
``` |
|
! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python |
|
``` |
|
|
|
2. **Download the model:** |
|
|
|
```python |
|
from huggingface_hub import hf_hub_download |
|
|
|
model_name = "cropinailab/aksara_v1_GGUF" |
|
model_file = "aksara_v1.Q4_K_M.gguf" |
|
model_path = hf_hub_download(model_name, |
|
filename=model_file, |
|
token='<YOUR_HF_TOKEN>' |
|
local_dir='<PATH_TO_SAVE_MODEL>') |
|
``` |
|
|
|
3. **Run the model:** |
|
|
|
```python |
|
from llama_cpp import Llama |
|
llm = Llama( |
|
model_path=model_path, # path to GGUF file |
|
n_ctx=4096, # The max sequence length to use - note that longer sequence lengths require much more resources |
|
n_gpu_layers=-1, # The number of layers to offload to GPU, if you have GPU acceleration available. |
|
# Set to 0 if no GPU acceleration is available on your system and -1 for all GPU layers. |
|
) |
|
prompt = "What are the recommended NPK dosage for maize varieties?" |
|
|
|
# Simple inference example |
|
output = llm( |
|
f"<|user|>\n{prompt}<|end|>\n<|assistant|>", |
|
max_tokens=512, # Generate up to 512 tokens |
|
stop=["<|end|>"], |
|
echo=True, # Whether to echo the prompt |
|
) |
|
print(output['choices'][0]['text']) |
|
``` |
|
|
|
**For using the model with a more detailed pipeline refer to the following [notebook](https://colab.research.google.com/drive/13u4msrKGJX2V_5_k8PZAVJh84-7XonmA?usp=sharing)** |