File size: 4,612 Bytes
a63a69a 31f1cbe a63a69a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
model-index:
- name: codify-llama-2-7b
results: []
---
# codify-llama-2-7b
This model is a fine-tuned version of [Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) on the [ALPACA_20k](https://raw.githubusercontent.com/sahil280114/codealpaca/master/data/code_alpaca_20k.json) dataset.
## Intended uses & limitations
1. Load the model as a Hugging Face Pipeline:
```Python
from transformers import pipeline
pipe = pipeline('text-generation', model='mohammedaly22/Codify-LLama-2-7b')
```
2. Prepare the instruction template
```Python
from string import Template
prompt_template_inference = Template("""You are a world class software engineer answering coding questions. Below is an
instruction that describes a coding task, paired with an optional input that
provides further context. Write a response that accurately completes the task if
the instruction is code-related, else, you should reponse that you don't know the answer
as it is outside the context of coding. Note, you should stop generation after reaching the <EOG> token.
### Instruction:
$instruction
### Input:
$input
### Response:
""")
```
3. Create an instruction prompt using the above template
```Python
instruction = "Write a Python function that creates a simple 2-layer neural network using Keras for performing binary classification"
input = "input shape of the neural network will be a vector of 200 elements"
prompt = prompt_template_inference.substitute({"instruction": instruction, "input": input})
```
This is the final instruction prompt that will be passed to the pipeline
```
You are a world class software engineer answering coding questions. Below is an
instruction that describes a coding task, paired with an optional input that
provides further context. Write a response that accurately completes the task if
the instruction is code-related, else, you should reponse that you don't know the answer
as it is outside the context of coding. Note, you should stop generation after reaching the <EOG> token.
### Instruction:
Write a Python function that creates a simple 2-layer neural network using Keras for performing binary classification
### Input:
input shape of the neural network will be a vector of 200 elements
### Response:
```
4. Passing the instruction prompt to the pipeline
```python
output = pipe(
prompt,
do_sample=True,
return_full_text=False,
max_new_tokens=200,
clean_up_tokenization_spaces=True
)
```
Here is the generated code of the model:
```python
def build_simple_neural_network():
return Model(
inputs=Input(shape=(200,)),
outputs=Dense(2, activation="softmax"),
name="simple_neural_network"
)
<EOG>
```
## Training procedure
### BitsAndBytes hyperparameters
- use_4bit: True
- bnb_4bit_compute_dtype: "float16"
- bnb_4bit_quant_type: "nf4"
- use_double_nested_quant: False
### LoRA configurations
- lora_r: 64
- lora_alpha: 16
- lora_dropout: 0.1
### Training hyperparameters
The following hyperparameters were used during training:
- num_train_epochs: 1
- fp16: False
- bf16: False
- per_device_train_batch_size: 4
- per_device_eval_batch_size: 4
- gradient_accumulation_steps: 1
- gradient_checkpointing: True
- max_grad_norm: 0.3
- learning_rate: 2e-4
- weight_decay: 0.001
- optim: "paged_adamw_32bit"
- lr_scheduler_type: "cosine"
- max_steps: -1
- warmup_ratio: 0.03
- group_by_length: True
- save_steps: 0
- logging_steps: 50
### Training results
| Step | Training Loss |
|:-----:|:-------------:|
| 50 | 1.377900 |
| 100 | 0.368700 |
| 150 | 0.336600 |
| 200 | 0.334800 |
| 250 | 0.332300 |
| 300 | 0.333700 |
| 350 | 0.322100 |
| 400 | 0.317000 |
| 450 | 0.320800 |
| 500 | 0.308400 |
| 550 | 0.321900 |
| 600 | 0.310700 |
| 650 | 0.322100 |
| 700 | 0.327700 |
| 750 | 0.322000 |
| 800 | 0.311300 |
| 850 | 0.321800 |
| 900 | 0.318700 |
| 950 | 0.321600 |
| 1000 | 0.314900 |
| 1050 | 0.321700 |
| 1100 | 0.307600 |
| 1150 | 0.315800 |
| 1200 | 0.316800 |
| 1250 | 0.314200 |
| 1300 | 0.310400 |
| 1350 | 0.308000 |
| 1400 | 0.318600 |
| 1450 | 0.309700 |
| 1500 | 0.307600 |
| 1550 | 0.296800 |
| 1600 | 0.305800 |
| 1650 | 0.307400 |
| 1700 | 0.327400 |
| 1750 | 0.306100 |
| 1800 | 0.309900 |
| 1850 | 0.316300 |
| 1900 | 0.299500 |
| 1950 | 0.315700 |
| 2000 | 0.307600 | |