sardukar
/

llama13b-4bit-v2

Text Generation

Inference Endpoints

Model card Files Files and versions Community

llama13b-4bit-v2 / README.md

sardukar's picture

Update README.md

84094fd over 1 year ago

|

1.13 kB

	---
	metrics: null
	---

	Quantized Meta AI's [LLaMA](https://arxiv.org/abs/2302.13971) in 4bit with the help of [GPTQ](https://arxiv.org/abs/2210.17323v2) algorithm v2.

	- [llama13b-4bit-ts-ao-g128-v2.safetensors](https://huggingface.co/sardukar/llama13b-4bit-v2/blob/main/llama13b-4bit-ts-ao-g128-v2.safetensors)
	GPTQ implementation - https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/49efe0b67db4b40eac2ae963819ebc055da64074

	Conversion process:
	```sh
	CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-13b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors ./q4/llama13b-4bit-ts-ao-g128-v2.safetensors
	```


	- [llama13b-4bit-v2.safetensors](https://huggingface.co/sardukar/llama13b-4bit-v2/blob/main/llama13b-4bit-v2.safetensors)
	GPTQ implementation - https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/841feedde876785bc8022ca48fd9c3ff626587e2

	Note: This model will fail to load with current GPTQ-for-LLaMa implementation

	Conversion process
	```sh
	CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-13b c4 --wbits 4 --true-sequential --act-order --save_safetensors ./q4/llama13b-4bit-v2.safetensors
	```