HF-Quantization
/

Llama-3.2-1B-BNB-INT8

8-bit precision

Model card Files Files and versions Community

Llama-3.2-1B-BNB-INT8 / README.md

medmekk's picture

medmekk HF staff

Upload README.md with huggingface_hub

da810b1 verified 23 days ago

|

history blame contribute delete

1 kB

	---
	base_model:
	- meta-llama/Llama-3.2-1B
	---
	# meta-llama/Llama-3.2-1B (Quantized)
	## Description
	This model is a quantized version of the original model `meta-llama/Llama-3.2-1B`. It was quantized using Bitsandbytes.
	## Quantization Details
	- Quantization Parameters: `BitsAndBytesConfig(load_in_8bit=True)`
	## Usage

	You can use this model in your applications by loading it directly from the Hugging Face Hub.

	In order to run the inference with `Llama-3.2-1B-BNB-INT8`, `torch` and`bitsandbytes` need to be installed as:
	```python
	pip install torch bitsandbytes --upgrade
	```
	Then, preferably the latest version of transformers need to be installed, as:
	```python
	pip install transformers[accelerate] --upgrade
	```
	To run the inference the model can be instantiated as any other causal language modeling model via AutoModelForCausalLM and run the inference normally.
	```python
	from transformers import AutoModelForCausalLM
	model = AutoModelForCausalLM.from_pretrained("Llama-3.2-1B-BNB-INT8")