chainyo
commited on
Commit
•
31458d1
1
Parent(s):
05a7a4a
add adapters + instructions + tokenizer
Browse files- README.md +167 -1
- adapter_config.json +19 -0
- adapter_model.bin +3 -0
- llama-natural-instructions-removebg-preview.png +0 -0
- tokenizer.model +3 -0
- tokenizer_config.json +1 -0
README.md
CHANGED
@@ -9,4 +9,170 @@ tags:
|
|
9 |
- peft
|
10 |
- LoRA
|
11 |
---
|
12 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
- peft
|
10 |
- LoRA
|
11 |
---
|
12 |
+
# LoRA LLaMA Natural Instructions
|
13 |
+
|
14 |
+
![LlaMA Natural Instructions](./llama-natural-instructions-removebg-preview.png)
|
15 |
+
|
16 |
+
This model is a fine-tuned version of [llama-13b](https://huggingface.co/decapoda-research/llama-13b-hf) from [Meta](https://huggingface.co/facebook),
|
17 |
+
on the [Natural Instructions](https://huggingface.co/datasets/Muennighoff/natural-instructions) dataset from [AllenAI](https://huggingface.co/allenai),
|
18 |
+
using the [LoRA](https://arxiv.org/pdf/2106.09685.pdf) training technique.
|
19 |
+
|
20 |
+
⚠️ **This model is for Research purpose only (See the [license](https://huggingface.co/decapoda-research/llama-13b-hf/blob/main/LICENSE))**
|
21 |
+
|
22 |
+
## WandB Report
|
23 |
+
|
24 |
+
Click on the badge below to see the full report on Weights & Biases.
|
25 |
+
|
26 |
+
[![WandB](https://img.shields.io/badge/Weights_&_Biases-FFCC33?style=for-the-badge&logo=WeightsAndBiases&logoColor=black)](https://api.wandb.ai/links/chainyo-mleng/91srpylj)
|
27 |
+
|
28 |
+
## Usage
|
29 |
+
|
30 |
+
### Installation
|
31 |
+
|
32 |
+
```bash
|
33 |
+
pip install loralib bitsandbytes datasets git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git sentencepiece
|
34 |
+
```
|
35 |
+
|
36 |
+
### Format of the input
|
37 |
+
|
38 |
+
The input should be a string of text with the following format:
|
39 |
+
|
40 |
+
```python
|
41 |
+
prompt_template = {
|
42 |
+
"prompt": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
|
43 |
+
"response": "### Response:"
|
44 |
+
}
|
45 |
+
|
46 |
+
def generate_prompt(
|
47 |
+
definition: str,
|
48 |
+
inputs: str,
|
49 |
+
targets: Union[None, str] = None,
|
50 |
+
) -> str:
|
51 |
+
"""Generate a prompt from instruction and input."""
|
52 |
+
res = prompt_template["prompt"].format(
|
53 |
+
instruction=definition, input=inputs
|
54 |
+
)
|
55 |
+
|
56 |
+
if targets:
|
57 |
+
res = f"{res}{targets}"
|
58 |
+
|
59 |
+
return res
|
60 |
+
|
61 |
+
def get_response(output: str) -> str:
|
62 |
+
"""Get the response from the output."""
|
63 |
+
return output.split(prompt_template["response"])[1].strip()
|
64 |
+
```
|
65 |
+
|
66 |
+
Feel free to use these utility functions to generate the prompt and to extract the response from the model output.
|
67 |
+
|
68 |
+
- `definition` is the instruction describing the task. It's generally a single sentence explaining the expected output and
|
69 |
+
the reasoning steps to follow.
|
70 |
+
- `inputs` is the input to the task. It can be a single sentence or a paragraph. It's the context used by the model to
|
71 |
+
generate the response to the task.
|
72 |
+
- `targets` is the expected output of the task. It's used for training the model. _It's not required for inference._
|
73 |
+
|
74 |
+
### Inference
|
75 |
+
|
76 |
+
You can load the model using only the adapters or load the full model with the adapters and the weights.
|
77 |
+
|
78 |
+
#### The tokenizer
|
79 |
+
|
80 |
+
```python
|
81 |
+
from transformers import LlamaTokenizer
|
82 |
+
|
83 |
+
tokenizer = LlamaTokenizer.from_pretrained("wordcab/llama-natural-instructions-13b")
|
84 |
+
tokenizer.padding_side = "left"
|
85 |
+
tokenizer.pad_token_id = (0)
|
86 |
+
```
|
87 |
+
|
88 |
+
#### Load the model with the adapters
|
89 |
+
|
90 |
+
```python
|
91 |
+
from peft import PeftModel
|
92 |
+
from transformers import LlamaForCausalLM
|
93 |
+
|
94 |
+
model = LlamaForCausalLM.from_pretrained(
|
95 |
+
"decapoda-research/llama-13b-hf",
|
96 |
+
load_in_8bit=True,
|
97 |
+
torch_dtype=torch.float16,
|
98 |
+
device_map="auto",
|
99 |
+
)
|
100 |
+
model = PeftModel.from_pretrained(
|
101 |
+
model,
|
102 |
+
"wordcab/llama-natural-instructions-13b",
|
103 |
+
torch_dtype=torch.float16,
|
104 |
+
device_map={"": 0},
|
105 |
+
)
|
106 |
+
```
|
107 |
+
|
108 |
+
#### Load the full model
|
109 |
+
|
110 |
+
⚠️ Work in progress...
|
111 |
+
|
112 |
+
```python
|
113 |
+
model = LlamaForCausalLM.from_pretrained(
|
114 |
+
"wordcab/llama-natural-instructions-13b",
|
115 |
+
load_in_8bit=True,
|
116 |
+
torch_dtype=torch.float16,
|
117 |
+
device_map="auto",
|
118 |
+
)
|
119 |
+
```
|
120 |
+
|
121 |
+
#### Evaluation mode
|
122 |
+
|
123 |
+
Don't forget to put the model in evaluation mode. And if you are using PyTorch v2.0 or higher don't forget to call
|
124 |
+
the compile method.
|
125 |
+
|
126 |
+
```python
|
127 |
+
model.eval()
|
128 |
+
if torch.__version__ >= "2":
|
129 |
+
model = torch.compile(model)
|
130 |
+
```
|
131 |
+
|
132 |
+
#### Generate the response
|
133 |
+
|
134 |
+
```python
|
135 |
+
prompt = generate_prompt(
|
136 |
+
"In this task, you have to analyze the full sentences and do reasoning and quick maths to find the correct answer.",
|
137 |
+
f"You are now a superbowl star. You are the quarterback of the team. Your team is down by 3 points. You are in the last 2 minutes of the game. The other team has a score of 28. What is the score of your team?",
|
138 |
+
)
|
139 |
+
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048)
|
140 |
+
input_ids = inputs["input_ids"].to(model.device)
|
141 |
+
|
142 |
+
with torch.no_grad():
|
143 |
+
gen_outputs = model.generate(
|
144 |
+
input_ids=input_ids,
|
145 |
+
generation_config=generation_config,
|
146 |
+
return_dict_in_generate=True,
|
147 |
+
output_scores=True,
|
148 |
+
max_new_tokens=50,
|
149 |
+
)
|
150 |
+
|
151 |
+
s = gen_outputs.sequences[0]
|
152 |
+
output = tokenizer.decode(s, skip_special_tokens=True)
|
153 |
+
response = prompter.get_response(output)
|
154 |
+
print(response)
|
155 |
+
>>> 25
|
156 |
+
```
|
157 |
+
|
158 |
+
You can try with other prompts that are not maths related as well! :hugs:
|
159 |
+
|
160 |
+
## Beanchmark
|
161 |
+
|
162 |
+
We benchmarked our model on the following tasks: [BoolQ](https://huggingface.co/datasets/boolq), [PIQA](https://huggingface.co/datasets/piqa), [WinoGrande](https://huggingface.co/datasets/winogrande), [OpenBookQA](https://huggingface.co/datasets/openbookqa).
|
163 |
+
|
164 |
+
| | BoolQ | PIQA | WinoGrande | OpenBookQA | Precision | Inference time (s) |
|
165 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
166 |
+
| Original LLaMA 7B | 76.5 | 79.8 | 70.1 | 57.2 | fp32 | 3 seconds |
|
167 |
+
| Original LLaMA 13B | 78.1 | 80.1 | 73 | 56.4 | fp32 | >5 seconds |
|
168 |
+
| LoRA LLaMA 7B | 63.9 | 51.3 | 48.9 | 31.4 | 8bit | 0.65 seconds |
|
169 |
+
| LoRA LLaMA 13B | 70 | 63.93 | 51.6 | 50.4 | 8bit | 1.2 seconds |
|
170 |
+
|
171 |
+
__Link to the 7B model:__ [wordcab/llama-natural-instructions-7b](https://huggingface.co/wordcab/llama-natural-instructions-7b)
|
172 |
+
|
173 |
+
Overall our LoRA model is less performant than the original model from Meta, if we compare the results from the [original paper](https://arxiv.org/pdf/2302.13971.pdf).
|
174 |
+
|
175 |
+
The performance degradation is due to the fact we load the model in 8bit and we use the adapters from the LoRA training.
|
176 |
+
Thanks to the 8bit quantization, the model is 4 times faster than the original model and the results are still decent.
|
177 |
+
|
178 |
+
Some complex tasks like WinoGrande and OpenBookQA are more difficult to solve with the adapters.
|
adapter_config.json
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"base_model_name_or_path": "decapoda-research/llama-13b-hf",
|
3 |
+
"bias": "none",
|
4 |
+
"enable_lora": null,
|
5 |
+
"fan_in_fan_out": false,
|
6 |
+
"inference_mode": true,
|
7 |
+
"init_lora_weights": true,
|
8 |
+
"lora_alpha": 16,
|
9 |
+
"lora_dropout": 0.05,
|
10 |
+
"merge_weights": false,
|
11 |
+
"modules_to_save": null,
|
12 |
+
"peft_type": "LORA",
|
13 |
+
"r": 8,
|
14 |
+
"target_modules": [
|
15 |
+
"q_proj",
|
16 |
+
"v_proj"
|
17 |
+
],
|
18 |
+
"task_type": "CAUSAL_LM"
|
19 |
+
}
|
adapter_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:830b58ecc97c15ac9768ac99aa931464814b343e9ba2309da32c942845f0caa4
|
3 |
+
size 26271757
|
llama-natural-instructions-removebg-preview.png
ADDED
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
|
3 |
+
size 499723
|
tokenizer_config.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"bos_token": "", "eos_token": "", "model_max_length": 2048, "tokenizer_class": "LlamaTokenizer", "unk_token": ""}
|