RichardErkhov commited on
Commit
9978ab4
1 Parent(s): 3d2bfc0

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +133 -0
README.md ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ phi-2-4bit-64rank - bnb 8bits
11
+ - Model creator: https://huggingface.co/LoftQ/
12
+ - Original model: https://huggingface.co/LoftQ/phi-2-4bit-64rank/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: mit
20
+ language:
21
+ - en
22
+ pipeline_tag: text-generation
23
+ tags:
24
+ - 'quantization '
25
+ - lora
26
+ ---
27
+ # LoftQ Initialization
28
+
29
+ | [Paper](https://arxiv.org/abs/2310.08659) | [Code](https://github.com/yxli2123/LoftQ) | [PEFT Example](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning) |
30
+
31
+ LoftQ (LoRA-fine-tuning-aware Quantization) provides a quantized backbone Q and LoRA adapters A and B, given a full-precision pre-trained weight W.
32
+
33
+ This model, `phi-2-4bit-64rank`, is obtained from [phi-2](https://huggingface.co/microsoft/phi-2).
34
+ The backbone is under `LoftQ/phi-2-4bit-64rank` and LoRA adapters are under the `subfolder='loftq_init'`.
35
+
36
+ ## Model Info
37
+ ### Backbone
38
+ - Stored format: `torch.float16`
39
+ - Size: ~ 5.5 GiB
40
+ - Loaded format: bitsandbytes nf4
41
+ - Size loaded on GPU: ~1.4 GiB
42
+
43
+ ### LoRA adapters
44
+ - rank: 64
45
+ - lora_alpha: 16
46
+ - target_modules: ["q_proj", "k_proj", "v_proj", "dense", "fc1", "fc2"]
47
+
48
+ ## Usage
49
+
50
+ **Training** Here's an example of loading this model and preparing for the LoRA fine-tuning.
51
+
52
+ ```python
53
+ import torch
54
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
55
+ from peft import PeftModel
56
+
57
+ MODEL_ID = "LoftQ/phi-2-4bit-64rank"
58
+
59
+ base_model = AutoModelForCausalLM.from_pretrained(
60
+ MODEL_ID,
61
+ torch_dtype=torch.float32, # you may change it with different models
62
+ quantization_config=BitsAndBytesConfig(
63
+ load_in_4bit=True,
64
+ bnb_4bit_compute_dtype=torch.float32, # float32 is tested and veryfied
65
+ bnb_4bit_use_double_quant=False,
66
+ bnb_4bit_quant_type='nf4',
67
+ ),
68
+ )
69
+ peft_model = PeftModel.from_pretrained(
70
+ base_model,
71
+ MODEL_ID,
72
+ subfolder="loftq_init",
73
+ is_trainable=True,
74
+ )
75
+
76
+ # Do training with peft_model ...
77
+ ```
78
+
79
+ ## Experiment Results
80
+ We have conducted experiments on supervised fine-tuning of [GSM8K](https://huggingface.co/datasets/gsm8k).
81
+
82
+ | Model | Bits | Rank | LoRA Initial | GSM8K |
83
+ | --------| ---- | ---- | ---------------------- | --------- |
84
+ | Phi-2 | 16 | - | Full model fine-tuning | 66.8±1.2 |
85
+ | Phi-2 | 16 | 64 | Gaussian + 0 (LoRA) | 64.8±0.5 |
86
+ | Phi-2 | 4 | 64 | Gaussian + 0 (QLoRA) | 60.2±0.6 |
87
+ | Phi-2 | 4 | 64 | LoftQ | 64.1±0.7 |
88
+
89
+
90
+
91
+ **Inference** Here is an example code for inference after the model has been fine-tuned on [GSM8K](https://huggingface.co/datasets/gsm8k).
92
+
93
+ ```python
94
+ import torch
95
+ from transformers import AutoModelForCausalLM, BitsAndBytesConfig
96
+ from peft import PeftModel
97
+
98
+ MODEL_ID = "LoftQ/phi-2-4bit-64rank"
99
+
100
+ base_model = AutoModelForCausalLM.from_pretrained(
101
+ MODEL_ID,
102
+ torch_dtype=torch.float32, # you may change it with different models
103
+ quantization_config=BitsAndBytesConfig(
104
+ load_in_4bit=True,
105
+ bnb_4bit_compute_dtype=torch.float32, # float32 is tested and veryfied
106
+ bnb_4bit_use_double_quant=False,
107
+ bnb_4bit_quant_type='nf4',
108
+ ),
109
+ )
110
+ peft_model = PeftModel.from_pretrained(
111
+ base_model,
112
+ MODEL_ID,
113
+ subfolder="gsm8k",
114
+ is_trainable=True,
115
+ )
116
+
117
+ # Do inference with peft_model ...
118
+ ```
119
+ See the full code at our [Github Repo]((https://github.com/yxli2123/LoftQ))
120
+
121
+
122
+ ## Citation
123
+
124
+ ```bibtex
125
+ @article{li2023loftq,
126
+ title={Loftq: Lora-fine-tuning-aware quantization for large language models},
127
+ author={Li, Yixiao and Yu, Yifan and Liang, Chen and He, Pengcheng and Karampatziakis, Nikos and Chen, Weizhu and Zhao, Tuo},
128
+ journal={arXiv preprint arXiv:2310.08659},
129
+ year={2023}
130
+ }
131
+ ```
132
+
133
+