cropinailab
/

aksara_v1_GGUF

Text Generation

Model card Files Files and versions Community

aksara_v1_GGUF / README.md

hingeankit's picture

Update README.md

8673964 verified 9 months ago

|

history blame contribute delete

1.75 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	inference: false
	tags:
	- gguf
	- 4bit
	---


	This repo provides the GGUF format for the [aksara_v1](https://huggingface.co/cropinailab/aksara_v1) model. This model has a precision of 4-bit and is capable of doing inference with GPU as well as CPU only.

	## To run using Python:

	1. Install llama-cpp-python:

	```
	! CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
	```

	2. Download the model:

	```python
	from huggingface_hub import hf_hub_download

	model_name = "cropinailab/aksara_v1_GGUF"
	model_file = "aksara_v1.Q4_K_M.gguf"
	model_path = hf_hub_download(model_name,
	filename=model_file,
	token='<YOUR_HF_TOKEN>'
	local_dir='<PATH_TO_SAVE_MODEL>')
	```

	3. Run the model:

	```python
	from llama_cpp import Llama
	llm = Llama(
	model_path=model_path, # path to GGUF file
	n_ctx=4096, # The max sequence length to use - note that longer sequence lengths require much more resources
	n_gpu_layers=-1, # The number of layers to offload to GPU, if you have GPU acceleration available.
	# Set to 0 if no GPU acceleration is available on your system and -1 for all GPU layers.
	)
	prompt = "What are the recommended NPK dosage for maize varieties?"

	# Simple inference example
	output = llm(
	f"<\|user\|>\n{prompt}<\|end\|>\n<\|assistant\|>",
	max_tokens=512, # Generate up to 512 tokens
	stop=["<\|end\|>"],
	echo=True, # Whether to echo the prompt
	)
	print(output['choices'][0]['text'])
	```

	For using the model with a more detailed pipeline refer to the following [notebook](https://colab.research.google.com/drive/13u4msrKGJX2V_5_k8PZAVJh84-7XonmA?usp=sharing)