Create README.md

6699d1b about 1 year ago

No virus

5.31 kB

	---
	license: mit
	---

	# Compressed LLM Model Zone

	The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/).

	License: [MIT License](https://opensource.org/license/mit/)

	Setup environment
	```shell
	pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
	pip install transformers==4.31.0
	pip install accelerate
	pip install auto-gptq # for gptq
	```

	How to use pruned models
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	base_model = 'llama-2-7b'
	comp_method = 'magnitude_unstructured'
	comp_degree = 0.2
	model_path = f'vita-group/{base_model}_{comp_method}'
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	revision=f's{comp_degree}',
	torch_dtype=torch.float16,
	low_cpu_mem_usage=True,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
	input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
	outputs = model.generate(input_ids, max_new_tokens=128)
	print(tokenizer.decode(outputs[0]))
	```

	How to use quantized models
	```python
	from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
	model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
	model = AutoGPTQForCausalLM.from_quantized(
	model_path,
	# inject_fused_attention=False, # or
	disable_exllama=True,
	device_map='auto',
	)
	```

	\| \| Base Model \| Model Size \| Compression Method \| Compression Degree \|
	\|---:\|:-------------\|:-------------\|:----------------------------------------------------------------------------------------------\|:--------------------------------------------------------------------------------------\|
	\| 0 \| Llama-2 \| 7b \| [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) \| [s0.1](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.1) \|
	\| 1 \| Llama-2 \| 7b \| [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) \| [s0.2](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.2) \|
	\| 2 \| Llama-2 \| 7b \| [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) \| [s0.3](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.3) \|
	\| 3 \| Llama-2 \| 7b \| [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) \| [s0.5](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.5) \|
	\| 4 \| Llama-2 \| 7b \| [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) \| [s0.6](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.6) \|
	\| 5 \| Llama-2 \| 7b \| [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) \| [s0.1](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.1) \|
	\| 6 \| Llama-2 \| 7b \| [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) \| [s0.2](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.2) \|
	\| 7 \| Llama-2 \| 7b \| [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) \| [s0.3](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.3) \|
	\| 8 \| Llama-2 \| 7b \| [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) \| [s0.5](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.5) \|
	\| 9 \| Llama-2 \| 7b \| [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) \| [s0.6](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.6) \|
	\| 10 \| Llama-2 \| 7b \| [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) \| [s0.1](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.1) \|
	\| 11 \| Llama-2 \| 7b \| [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) \| [s0.2](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.2) \|
	\| 12 \| Llama-2 \| 7b \| [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) \| [s0.3](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.3) \|
	\| 13 \| Llama-2 \| 7b \| [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) \| [s0.5](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.5) \|
	\| 14 \| Llama-2 \| 7b \| [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) \| [s0.6](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.6) \|