jyhong836 commited on
Commit
6699d1b
1 Parent(s): 9641f03

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Compressed LLM Model Zone
6
+
7
+ The models are prepared by [Visual Informatics Group @ University of Texas at Austin (VITA-group)](https://vita-group.github.io/).
8
+
9
+ License: [MIT License](https://opensource.org/license/mit/)
10
+
11
+ Setup environment
12
+ ```shell
13
+ pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
14
+ pip install transformers==4.31.0
15
+ pip install accelerate
16
+ pip install auto-gptq # for gptq
17
+ ```
18
+
19
+ How to use pruned models
20
+ ```python
21
+ import torch
22
+ from transformers import AutoModelForCausalLM, AutoTokenizer
23
+ base_model = 'llama-2-7b'
24
+ comp_method = 'magnitude_unstructured'
25
+ comp_degree = 0.2
26
+ model_path = f'vita-group/{base_model}_{comp_method}'
27
+ model = AutoModelForCausalLM.from_pretrained(
28
+ model_path,
29
+ revision=f's{comp_degree}',
30
+ torch_dtype=torch.float16,
31
+ low_cpu_mem_usage=True,
32
+ device_map="auto"
33
+ )
34
+ tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
35
+ input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
36
+ outputs = model.generate(input_ids, max_new_tokens=128)
37
+ print(tokenizer.decode(outputs[0]))
38
+ ```
39
+
40
+ How to use quantized models
41
+ ```python
42
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
43
+ model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
44
+ model = AutoGPTQForCausalLM.from_quantized(
45
+ model_path,
46
+ # inject_fused_attention=False, # or
47
+ disable_exllama=True,
48
+ device_map='auto',
49
+ )
50
+ ```
51
+
52
+ | | Base Model | Model Size | Compression Method | Compression Degree |
53
+ |---:|:-------------|:-------------|:----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
54
+ | 0 | Llama-2 | 7b | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) | [s0.1](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.1) |
55
+ | 1 | Llama-2 | 7b | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) | [s0.2](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.2) |
56
+ | 2 | Llama-2 | 7b | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) | [s0.3](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.3) |
57
+ | 3 | Llama-2 | 7b | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) | [s0.5](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.5) |
58
+ | 4 | Llama-2 | 7b | [magnitude_unstructured](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured) | [s0.6](https://huggingface.co/vita-group/llama-2-7b_magnitude_unstructured/tree/s0.6) |
59
+ | 5 | Llama-2 | 7b | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) | [s0.1](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.1) |
60
+ | 6 | Llama-2 | 7b | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) | [s0.2](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.2) |
61
+ | 7 | Llama-2 | 7b | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) | [s0.3](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.3) |
62
+ | 8 | Llama-2 | 7b | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) | [s0.5](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.5) |
63
+ | 9 | Llama-2 | 7b | [sparsegpt_unstructured](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured) | [s0.6](https://huggingface.co/vita-group/llama-2-7b_sparsegpt_unstructured/tree/s0.6) |
64
+ | 10 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.1](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.1) |
65
+ | 11 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.2](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.2) |
66
+ | 12 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.3](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.3) |
67
+ | 13 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.5](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.5) |
68
+ | 14 | Llama-2 | 7b | [wanda_unstructured](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured) | [s0.6](https://huggingface.co/vita-group/llama-2-7b_wanda_unstructured/tree/s0.6) |