perlthoughts commited on
Commit
1a7091c
1 Parent(s): 41f5e13

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_trainer
4
+ - code
5
+ - coding
6
+ - llama-2
7
+ - gptq
8
+ model-index:
9
+ - name: Llama-2-7b-4bit-python-coder
10
+ results: []
11
+ license: apache-2.0
12
+ language:
13
+ - code
14
+ datasets:
15
+ - iamtarun/python_code_instructions_18k_alpaca
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # LlaMa 2 7b 4-bit Python Coder 👩‍💻
20
+
21
+ **LlaMa-2 7b** fine-tuned on the **python_code_instructions_18k_alpaca Code instructions dataset** by using the method **QLoRA** in 4-bit with [PEFT](https://github.com/huggingface/peft) library.
22
+
23
+ ## Pretrained description
24
+
25
+ [Llama-2](https://huggingface.co/meta-llama/Llama-2-7b)
26
+
27
+ Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
28
+
29
+ Model Architecture Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety
30
+
31
+ ## Training data
32
+
33
+ [python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca)
34
+
35
+ The dataset contains problem descriptions and code in python language. This dataset is taken from sahil2801/code_instructions_120k, which adds a prompt column in alpaca style.
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following `bitsandbytes` quantization config was used during training:
40
+ - load_in_8bit: False
41
+ - load_in_4bit: True
42
+ - llm_int8_threshold: 6.0
43
+ - llm_int8_skip_modules: None
44
+ - llm_int8_enable_fp32_cpu_offload: False
45
+ - llm_int8_has_fp16_weight: False
46
+ - bnb_4bit_quant_type: nf4
47
+ - bnb_4bit_use_double_quant: False
48
+ - bnb_4bit_compute_dtype: float16
49
+
50
+ **SFTTrainer arguments**
51
+ ```py
52
+ # Number of training epochs
53
+ num_train_epochs = 1
54
+ # Enable fp16/bf16 training (set bf16 to True with an A100)
55
+ fp16 = False
56
+ bf16 = True
57
+ # Batch size per GPU for training
58
+ per_device_train_batch_size = 4
59
+ # Number of update steps to accumulate the gradients for
60
+ gradient_accumulation_steps = 1
61
+ # Enable gradient checkpointing
62
+ gradient_checkpointing = True
63
+ # Maximum gradient normal (gradient clipping)
64
+ max_grad_norm = 0.3
65
+ # Initial learning rate (AdamW optimizer)
66
+ learning_rate = 2e-4
67
+ # Weight decay to apply to all layers except bias/LayerNorm weights
68
+ weight_decay = 0.001
69
+ # Optimizer to use
70
+ optim = "paged_adamw_32bit"
71
+ # Learning rate schedule
72
+ lr_scheduler_type = "cosine" #"constant"
73
+ # Ratio of steps for a linear warmup (from 0 to learning rate)
74
+ warmup_ratio = 0.03
75
+ ```
76
+ ### Framework versions
77
+ - PEFT 0.4.0
78
+
79
+ ### Training metrics
80
+ ```
81
+ {'loss': 1.044, 'learning_rate': 3.571428571428572e-05, 'epoch': 0.01}
82
+ {'loss': 0.8413, 'learning_rate': 7.142857142857143e-05, 'epoch': 0.01}
83
+ {'loss': 0.7299, 'learning_rate': 0.00010714285714285715, 'epoch': 0.02}
84
+ {'loss': 0.6593, 'learning_rate': 0.00014285714285714287, 'epoch': 0.02}
85
+ {'loss': 0.6309, 'learning_rate': 0.0001785714285714286, 'epoch': 0.03}
86
+ {'loss': 0.5916, 'learning_rate': 0.00019999757708974043, 'epoch': 0.03}
87
+ {'loss': 0.5861, 'learning_rate': 0.00019997032069768138, 'epoch': 0.04}
88
+ {'loss': 0.6118, 'learning_rate': 0.0001999127875580558, 'epoch': 0.04}
89
+ {'loss': 0.5928, 'learning_rate': 0.00019982499509519857, 'epoch': 0.05}
90
+ {'loss': 0.5978, 'learning_rate': 0.00019970696989770335, 'epoch': 0.05}
91
+ {'loss': 0.5791, 'learning_rate': 0.0001995587477103701, 'epoch': 0.06}
92
+ {'loss': 0.6054, 'learning_rate': 0.00019938037342337933, 'epoch': 0.06}
93
+ {'loss': 0.5864, 'learning_rate': 0.00019917190105869708, 'epoch': 0.07}
94
+ {'loss': 0.6159, 'learning_rate': 0.0001989333937537136, 'epoch': 0.08}
95
+ {'loss': 0.583, 'learning_rate': 0.00019866492374212205, 'epoch': 0.08}
96
+ {'loss': 0.6066, 'learning_rate': 0.00019836657233204182, 'epoch': 0.09}
97
+ {'loss': 0.5934, 'learning_rate': 0.00019803842988139374, 'epoch': 0.09}
98
+ {'loss': 0.5836, 'learning_rate': 0.00019768059577053473, 'epoch': 0.1}
99
+ {'loss': 0.6021, 'learning_rate': 0.00019729317837215943, 'epoch': 0.1}
100
+ {'loss': 0.5659, 'learning_rate': 0.00019687629501847898, 'epoch': 0.11}
101
+ {'loss': 0.5754, 'learning_rate': 0.00019643007196568606, 'epoch': 0.11}
102
+ {'loss': 0.5936, 'learning_rate': 0.000195954644355717, 'epoch': 0.12}
103
+ ```
104
+
105
+ ### Example of usage
106
+
107
+ ```py
108
+ import torch
109
+ from transformers import AutoModelForCausalLM, AutoTokenizer
110
+ model_id = "edumunozsala/llama-2-7b-int4-python-code-20k"
111
+ tokenizer = AutoTokenizer.from_pretrained(hf_model_repo)
112
+ model = AutoModelForCausalLM.from_pretrained(hf_model_repo, load_in_4bit=True, torch_dtype=torch.float16,
113
+ device_map=device_map)
114
+ instruction="Write a Python function to display the first and last elements of a list."
115
+ input=""
116
+ prompt = f"""### Instruction:
117
+ Use the Task below and the Input given to write the Response, which is a programming code that can solve the Task.
118
+ ### Task:
119
+ {instruction}
120
+ ### Input:
121
+ {input}
122
+ ### Response:
123
+ """
124
+ input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
125
+ # with torch.inference_mode():
126
+ outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.5)
127
+ print(f"Prompt:\n{prompt}\n")
128
+ print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
129
+ ```
130
+
131
+ ### Citation
132
+
133
+ ```
134
+ @misc {edumunozsala_2023,
135
+ author = { {Eduardo Muñoz} },
136
+ title = { llama-2-7b-int4-python-coder },
137
+ year = 2023,
138
+ url = { https://huggingface.co/edumunozsala/llama-2-7b-int4-python-18k-alpaca },
139
+ publisher = { Hugging Face }
140
+ }
141
+ ```