add adapters + instructions + tokenizer

Browse files

Files changed (6) hide show

README.md +167 -1
adapter_config.json +19 -0
adapter_model.bin +3 -0
llama-natural-instructions-removebg-preview.png +0 -0
tokenizer.model +3 -0
tokenizer_config.json +1 -0

README.md CHANGED Viewed

@@ -9,4 +9,170 @@ tags:
 - peft
 - LoRA
 ---
-# WIP

 - peft
 - LoRA
 ---
+# LoRA LLaMA Natural Instructions
+![LlaMA Natural Instructions](./llama-natural-instructions-removebg-preview.png)
+This model is a fine-tuned version of [llama-13b](https://huggingface.co/decapoda-research/llama-13b-hf) from [Meta](https://huggingface.co/facebook),
+on the [Natural Instructions](https://huggingface.co/datasets/Muennighoff/natural-instructions) dataset from [AllenAI](https://huggingface.co/allenai),
+using the [LoRA](https://arxiv.org/pdf/2106.09685.pdf) training technique.
+⚠️ **This model is for Research purpose only (See the [license](https://huggingface.co/decapoda-research/llama-13b-hf/blob/main/LICENSE))**
+## WandB Report
+Click on the badge below to see the full report on Weights & Biases.
+[![WandB](https://img.shields.io/badge/Weights_&_Biases-FFCC33?style=for-the-badge&logo=WeightsAndBiases&logoColor=black)](https://api.wandb.ai/links/chainyo-mleng/91srpylj)
+## Usage
+### Installation
+```bash
+pip install loralib bitsandbytes datasets git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git sentencepiece
+```
+### Format of the input
+The input should be a string of text with the following format:
+```python
+prompt_template = {
+    "prompt": "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n",
+    "response": "### Response:"
+}
+def generate_prompt(
+    definition: str,
+    inputs: str,
+    targets: Union[None, str] = None,
+) -> str:
+    """Generate a prompt from instruction and input."""
+    res = prompt_template["prompt"].format(
+        instruction=definition, input=inputs
+    )
+    if targets:
+        res = f"{res}{targets}"
+    return res
+def get_response(output: str) -> str:
+    """Get the response from the output."""
+    return output.split(prompt_template["response"])[1].strip()
+```
+Feel free to use these utility functions to generate the prompt and to extract the response from the model output.
+- `definition` is the instruction describing the task. It's generally a single sentence explaining the expected output and
+the reasoning steps to follow.
+- `inputs` is the input to the task. It can be a single sentence or a paragraph. It's the context used by the model to
+generate the response to the task.
+- `targets` is the expected output of the task. It's used for training the model. _It's not required for inference._
+### Inference
+You can load the model using only the adapters or load the full model with the adapters and the weights.
+#### The tokenizer
+```python
+from transformers import LlamaTokenizer
+tokenizer = LlamaTokenizer.from_pretrained("wordcab/llama-natural-instructions-13b")
+tokenizer.padding_side = "left"
+tokenizer.pad_token_id = (0)
+```
+#### Load the model with the adapters
+```python
+from peft import PeftModel
+from transformers import LlamaForCausalLM
+model = LlamaForCausalLM.from_pretrained(
+    "decapoda-research/llama-13b-hf",
+    load_in_8bit=True,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+model = PeftModel.from_pretrained(
+    model,
+    "wordcab/llama-natural-instructions-13b",
+    torch_dtype=torch.float16,
+    device_map={"": 0},
+)
+```
+#### Load the full model
+⚠️ Work in progress...
+```python
+model = LlamaForCausalLM.from_pretrained(
+    "wordcab/llama-natural-instructions-13b",
+    load_in_8bit=True,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+```
+#### Evaluation mode
+Don't forget to put the model in evaluation mode. And if you are using PyTorch v2.0 or higher don't forget to call
+the compile method.
+```python
+model.eval()
+if torch.__version__ >= "2":
+    model = torch.compile(model)
+```
+#### Generate the response
+```python
+prompt = generate_prompt(
+    "In this task, you have to analyze the full sentences and do reasoning and quick maths to find the correct answer.",
+    f"You are now a superbowl star. You are the quarterback of the team. Your team is down by 3 points. You are in the last 2 minutes of the game. The other team has a score of 28. What is the score of your team?",
+)
+inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True, max_length=2048)
+input_ids = inputs["input_ids"].to(model.device)
+with torch.no_grad():
+    gen_outputs = model.generate(
+        input_ids=input_ids,
+        generation_config=generation_config,
+        return_dict_in_generate=True,
+        output_scores=True,
+        max_new_tokens=50,
+    )
+s = gen_outputs.sequences[0]
+output = tokenizer.decode(s, skip_special_tokens=True)
+response = prompter.get_response(output)
+print(response)
+>>> 25
+```
+You can try with other prompts that are not maths related as well! :hugs:
+## Beanchmark
+We benchmarked our model on the following tasks: [BoolQ](https://huggingface.co/datasets/boolq), [PIQA](https://huggingface.co/datasets/piqa), [WinoGrande](https://huggingface.co/datasets/winogrande), [OpenBookQA](https://huggingface.co/datasets/openbookqa).
+|     | BoolQ | PIQA | WinoGrande | OpenBookQA | Precision | Inference time (s) |
+| --- | ---   | ---  | ---        | ---        | ---       | ---                |
+| Original LLaMA 7B | 76.5 | 79.8 | 70.1 | 57.2 | fp32 | 3 seconds |
+| Original LLaMA 13B | 78.1 | 80.1 | 73 | 56.4 | fp32 | >5 seconds |
+| LoRA LLaMA 7B | 63.9 | 51.3 | 48.9 | 31.4 | 8bit | 0.65 seconds |
+| LoRA LLaMA 13B | 70 | 63.93 | 51.6 | 50.4 | 8bit | 1.2 seconds |
+__Link to the 7B model:__ [wordcab/llama-natural-instructions-7b](https://huggingface.co/wordcab/llama-natural-instructions-7b)
+Overall our LoRA model is less performant than the original model from Meta, if we compare the results from the [original paper](https://arxiv.org/pdf/2302.13971.pdf).
+The performance degradation is due to the fact we load the model in 8bit and we use the adapters from the LoRA training.
+Thanks to the 8bit quantization, the model is 4 times faster than the original model and the results are still decent.
+Some complex tasks like WinoGrande and OpenBookQA are more difficult to solve with the adapters.

adapter_config.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "base_model_name_or_path": "decapoda-research/llama-13b-hf",
+  "bias": "none",
+  "enable_lora": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "merge_weights": false,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:830b58ecc97c15ac9768ac99aa931464814b343e9ba2309da32c942845f0caa4
+size 26271757

llama-natural-instructions-removebg-preview.png ADDED Viewed

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
+size 499723

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"bos_token": "", "eos_token": "", "model_max_length": 2048, "tokenizer_class": "LlamaTokenizer", "unk_token": ""}