File size: 6,306 Bytes
8bc1e8a
 
 
 
 
 
 
 
 
 
 
6dfb6c1
8bc1e8a
62c31bf
 
8e001d7
 
 
62c31bf
 
 
 
 
 
 
382b743
62c31bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6f4c685
 
 
 
 
 
 
62c31bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fcfcdf
 
62c31bf
 
 
 
 
b3349e0
 
62c31bf
 
 
 
 
 
2e4a4de
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
datasets:
- Muennighoff/natural-instructions
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- peft
- llama
---
# LoRA LLaMA Natural Instructions

![LlaMA Natural Instructions](./llama-natural-instructions-removebg-preview.png)

This model is a fine-tuned version of [llama-7b](https://huggingface.co/decapoda-research/llama-7b-hf) from [Meta](https://huggingface.co/facebook),
on the [Natural Instructions](https://huggingface.co/datasets/Muennighoff/natural-instructions) dataset from [AllenAI](https://huggingface.co/allenai),
using the [LoRA](https://arxiv.org/pdf/2106.09685.pdf) training technique.

⚠️ **This model is for Research purpose only (See the [license](https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/LICENSE))**

## WandB Report

Click on the badge below to see the full report on Weights & Biases.

[![WandB](https://img.shields.io/badge/Weights_&_Biases-FFCC33?style=for-the-badge&logo=WeightsAndBiases&logoColor=black)](https://api.wandb.ai/links/chainyo-mleng/ia2mloow)

## Usage

### Installation

```bash
pip install loralib bitsandbytes datasets git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git sentencepiece
```

### Format of the input

The input should be a string of text with the following format:

```python
prompt_template = {
    "prompt": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
    "response": "### Response:"    
}

def generate_prompt(
    definition: str,
    inputs: str,
    targets: Union[None, str] = None,
) -> str:
    """Generate a prompt from instruction and input."""
    res = prompt_template["prompt"].format(
        instruction=definition, input=inputs
    )

    if targets:
        res = f"{res}{targets}"

    return res

def get_response(output: str) -> str:
    """Get the response from the output."""
    return output.split(prompt_template["response"])[1].strip()
```

Feel free to use these utility functions to generate the prompt and to extract the response from the model output.

- `definition` is the instruction describing the task. It's generally a single sentence explaining the expected output and
the reasoning steps to follow.
- `inputs` is the input to the task. It can be a single sentence or a paragraph. It's the context used by the model to
generate the response to the task.
- `targets` is the expected output of the task. It's used for training the model. _It's not required for inference._

### Inference

You can load the model using only the adapters or load the full model with the adapters and the weights.

#### The tokenizer

```python
from transformers import LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained("wordcab/llama-natural-instructions-7b")
tokenizer.padding_side = "left"
tokenizer.pad_token_id = (0)
```

#### Load the model with the adapters

```python
from peft import PeftModel
from transformers import LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained(
    "decapoda-research/llama-7b-hf",
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(
    model,
    "wordcab/llama-natural-instructions-7b",
    torch_dtype=torch.float16,
    device_map={"": 0},
)
```

#### Load the full model

⚠️ Work in progress...

```python
model = LlamaForCausalLM.from_pretrained(
    "wordcab/llama-natural-instructions-7b",
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)
```

#### Evaluation mode

Don't forget to put the model in evaluation mode. And if you are using PyTorch v2.0 or higher don't forget to call
the compile method.

```python
model.eval()
if torch.__version__ >= "2":
    model = torch.compile(model)
```

#### Generate the response

```python
prompt = generate_prompt(
    "In this task, you have to analyze the full sentences and do reasoning and quick maths to find the correct answer.",
    f"You are now a superbowl star. You are the quarterback of the team. Your team is down by 3 points. You are in the last 2 minutes of the game. The other team has a score of 28. What is the score of your team?",
)
inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048)
input_ids = inputs["input_ids"].to(model.device)

generation_config = GenerationConfig(
    temperature=0.2,
    top_p=0.75,
    top_k=40,
    num_beams=4,
)

with torch.no_grad():
    gen_outputs = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=50,
    )

s = gen_outputs.sequences[0]
output = tokenizer.decode(s, skip_special_tokens=True)
response = prompter.get_response(output)
print(response)
>>> 25
```

You can try with other prompts that are not maths related as well! :hugs:

## Beanchmark

We benchmarked our model on the following tasks: [BoolQ](https://huggingface.co/datasets/boolq), [PIQA](https://huggingface.co/datasets/piqa), [WinoGrande](https://huggingface.co/datasets/winogrande), [OpenBookQA](https://huggingface.co/datasets/openbookqa).

|     | BoolQ | PIQA | WinoGrande | OpenBookQA | Precision | Inference time (s) |
| --- | ---   | ---  | ---        | ---        | ---       | ---                |
| Original LLaMA 7B | 76.5 | 79.8 | 70.1 | 57.2 | fp32 | 3 seconds |
| Original LLaMA 13B | 78.1 | 80.1 | 73 | 56.4 | fp32 | >5 seconds |
| LoRA LLaMA 7B | 63.9 | 51.3 | 48.9 | 31.4 | 8bit | 0.65 seconds |
| LoRA LLaMA 13B | 70 | 63.93 | 51.6 | 50.4 | 8bit | 1.2 seconds |

__Link to the 13B model:__ [wordcab/llama-natural-instructions-13b](https://huggingface.co/wordcab/llama-natural-instructions-13b)

Overall our LoRA model is less performant than the original model from Meta, if we compare the results from the [original paper](https://arxiv.org/pdf/2302.13971.pdf).

The performance degradation is due to the fact we load the model in 8bit and we use the adapters from the LoRA training.
Thanks to the 8bit quantization, the model is 4 times faster than the original model and the results are still decent.

Some complex tasks like WinoGrande and OpenBookQA are more difficult to solve with the adapters.

## Training Hardware

This model was trained on a single NVIDIA RTX 3090 GPU.