File size: 4,612 Bytes
a63a69a
 
 
 
 
 
 
 
 
 
 
31f1cbe
a63a69a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
model-index:
- name: codify-llama-2-7b
  results: []
---

# codify-llama-2-7b
This model is a fine-tuned version of [Llama-2-7b-hf](https://huggingface.co/NousResearch/Llama-2-7b-hf) on the [ALPACA_20k](https://raw.githubusercontent.com/sahil280114/codealpaca/master/data/code_alpaca_20k.json) dataset.

## Intended uses & limitations
1. Load the model as a Hugging Face Pipeline:

```Python
from transformers import pipeline

pipe = pipeline('text-generation', model='mohammedaly22/Codify-LLama-2-7b')
```

2. Prepare the instruction template
```Python
from string import Template

prompt_template_inference = Template("""You are a world class software engineer answering coding questions. Below is an
instruction that describes a coding task, paired with an  optional input that
provides further context. Write a response that accurately completes the task if
the instruction is code-related, else, you should reponse that you don't know the answer
as it is outside the context of coding. Note, you should stop generation after reaching the <EOG> token.

### Instruction:
$instruction

### Input:
$input

### Response:
""")
```

3. Create an instruction prompt using the above template
```Python
instruction = "Write a Python function that creates a simple 2-layer neural network using Keras for performing binary classification"
input = "input shape of the neural network will be a vector of 200 elements"
prompt = prompt_template_inference.substitute({"instruction": instruction, "input": input})
```

This is the final instruction prompt that will be passed to the pipeline
```
You are a world class software engineer answering coding questions. Below is an
instruction that describes a coding task, paired with an  optional input that
provides further context. Write a response that accurately completes the task if
the instruction is code-related, else, you should reponse that you don't know the answer
as it is outside the context of coding. Note, you should stop generation after reaching the <EOG> token.

### Instruction:
Write a Python function that creates a simple 2-layer neural network using Keras for performing binary classification

### Input:
input shape of the neural network will be a vector of 200 elements

### Response:
```

4. Passing the instruction prompt to the pipeline
```python
output = pipe(
    prompt,
    do_sample=True,
    return_full_text=False,
    max_new_tokens=200,
    clean_up_tokenization_spaces=True
)
```

Here is the generated code of the model:
```python
def build_simple_neural_network(): 
    return Model(
      inputs=Input(shape=(200,)),
      outputs=Dense(2, activation="softmax"),
      name="simple_neural_network"
    )

<EOG>
```

## Training procedure

### BitsAndBytes hyperparameters
- use_4bit: True
- bnb_4bit_compute_dtype: "float16"
- bnb_4bit_quant_type: "nf4"
- use_double_nested_quant: False

### LoRA configurations
- lora_r: 64
- lora_alpha: 16
- lora_dropout: 0.1


### Training hyperparameters
The following hyperparameters were used during training:
- num_train_epochs: 1
- fp16: False
- bf16: False
- per_device_train_batch_size: 4
- per_device_eval_batch_size: 4
- gradient_accumulation_steps: 1
- gradient_checkpointing: True
- max_grad_norm: 0.3
- learning_rate: 2e-4
- weight_decay: 0.001
- optim: "paged_adamw_32bit"
- lr_scheduler_type: "cosine"
- max_steps: -1
- warmup_ratio: 0.03
- group_by_length: True
- save_steps: 0
- logging_steps: 50


### Training results

| Step  | Training Loss | 
|:-----:|:-------------:|
| 50    | 1.377900      |
| 100   | 0.368700      |
| 150   | 0.336600      |
| 200   | 0.334800      |
| 250   | 0.332300      |
| 300   | 0.333700      |
| 350   | 0.322100      |
| 400   | 0.317000      |
| 450   | 0.320800      |
| 500   | 0.308400      |
| 550   | 0.321900      |
| 600   | 0.310700      |
| 650   | 0.322100      |
| 700   | 0.327700      |
| 750   | 0.322000      |
| 800   | 0.311300      |
| 850   | 0.321800      |
| 900   | 0.318700      |
| 950   | 0.321600      |
| 1000  | 0.314900      |
| 1050  | 0.321700      |
| 1100  | 0.307600      |
| 1150  | 0.315800      |
| 1200  | 0.316800      |
| 1250  | 0.314200      |
| 1300  | 0.310400      |
| 1350  | 0.308000      |
| 1400  | 0.318600      |
| 1450  | 0.309700      |
| 1500  | 0.307600      |
| 1550  | 0.296800      |
| 1600  | 0.305800      |
| 1650  | 0.307400      |
| 1700  | 0.327400      |
| 1750  | 0.306100      |
| 1800  | 0.309900      |
| 1850  | 0.316300      |
| 1900  | 0.299500      |
| 1950  | 0.315700      |
| 2000  | 0.307600      |