keriati commited on
Commit
9cb7bcf
1 Parent(s): a86f5de

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +232 -0
README.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ base_model: deepseek-ai/deepseek-coder-1.3b-base
4
+ tags:
5
+ - axolotl
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: deepseek-coder-1.3b-typescript
9
+ results: []
10
+ datasets:
11
+ - bigcode/the-stack-dedup
12
+ widget:
13
+ - text: "class Person {\n constructor(public name:"
14
+ example_title: "class"
15
+ - text: "function quickSort"
16
+ example_title: "function"
17
+ ---
18
+
19
+ <p align="center">
20
+ <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="codegpt-deepseek-typescript.png?raw=true">
21
+ </p>
22
+ <p align="center"><a href="https://codegpt.co/">[CodeGPT.co]</a> | <a href="https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript">[🦙 Ollama]</a> | <a href="https://discord.gg/fKyyJX5pne">[Discord]</a> | <a href="https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt">[VSCode Extension]</a> </p>
23
+ <hr>
24
+
25
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
26
+ <details><summary>See axolotl config</summary>
27
+
28
+ axolotl version: `0.3.0`
29
+ ```yaml
30
+ base_model: deepseek-ai/deepseek-coder-1.3b-base
31
+ model_type: AutoModelForCausalLM
32
+ trust_remote_code: true
33
+ load_in_8bit: false
34
+ load_in_4bit: false
35
+ strict: false
36
+
37
+
38
+ datasets:
39
+ - path: CodeGPTPlus/typescript-0-500000-seq1024
40
+ type: completion
41
+ field: text
42
+
43
+
44
+ val_set_size: 0.001
45
+ output_dir: ./fft-out
46
+
47
+ sequence_len: 1024
48
+
49
+ adapter:
50
+ lora_model_dir:
51
+ lora_r:
52
+ lora_alpha:
53
+ lora_dropout:
54
+ lora_target_linear:
55
+ lora_fan_in_fan_out:
56
+ lora_modules_to_save:
57
+
58
+ wandb_project: deepseek_1.3_fft
59
+ wandb_entity:
60
+ wandb_watch:
61
+ wandb_name: aws_a10g
62
+ wandb_log_model: end
63
+
64
+
65
+ gradient_accumulation_steps: 2
66
+ micro_batch_size: 20
67
+ num_epochs: 1
68
+ optimizer: adamw_bnb_8bit
69
+ adam_beta1: 0.9
70
+ adam_beta2: 0.999
71
+ adam_epsilon: 0.000001
72
+ max_grad_norm: 1.0
73
+ weight_decay: 0.1
74
+ lr_scheduler: cosine
75
+ learning_rate: 0.00002
76
+ train_on_inputs: false
77
+ group_by_length: false
78
+ bf16: true
79
+ fp16: false
80
+ tf32: false
81
+ gradient_checkpointing: true
82
+ early_stopping_patience:
83
+ resume_from_checkpoint:
84
+ local_rank:
85
+ logging_steps: 1
86
+ xformers_attention:
87
+ flash_attention: true
88
+
89
+ loss_watchdog_threshold: 5.0
90
+ loss_watchdog_patience: 3
91
+
92
+ hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript
93
+ hub_strategy: every_save
94
+ warmup_ratio: 0.01
95
+ evals_per_epoch: 20
96
+ saves_per_epoch: 3
97
+ debug:
98
+ deepspeed:
99
+
100
+ fsdp:
101
+ fsdp_config:
102
+ special_tokens:
103
+ bos_token: "<|begin▁of▁sentence|>"
104
+ eos_token: "<|end▁of▁sentence|>"
105
+ pad_token: "<|end▁of▁sentence|>"
106
+ ```
107
+
108
+ </details><br>
109
+
110
+ # deepseek-coder-1.3b-typescript
111
+
112
+ CodeGPTPlus/deepseek-coder-1.3b-typescript, emerges as a fine-tuned iteration of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base), meticulously crafted by the CodeGPT team to excel in generating expert code in TypeScript. With specific fine-tuning for TypeScript and a dataset of 0.5B tokens, this model excels in producing precise and efficient solutions in this programming language.
113
+
114
+ The 16K window size and an additional fill-in-the-middle task are employed to deliver project-level code completion.
115
+
116
+ This new model stands as the ideal choice for those seeking a specialized code generator for TypeScript, backed by the expertise of the CodeGPT team.
117
+
118
+ It achieves the following results on the evaluation set:
119
+ - Loss: 0.7681
120
+
121
+ **Model Developers** CodeGPT Team
122
+
123
+ **Variations** 1.3B
124
+
125
+ **Input** Models input text only.
126
+
127
+ **Output** Models generate text only.
128
+
129
+ ## How to Use
130
+ This model is for completion purposes only. Here give some examples of how to use the model.
131
+
132
+ #### Running the model on a GPU
133
+ ```python
134
+ from transformers import AutoTokenizer, AutoModelForCausalLM
135
+ tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript",
136
+ trust_remote_code=True)
137
+ model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript",
138
+ trust_remote_code=True).cuda()
139
+
140
+ input_text = """<|fim▁begin|>function quickSort(arr: number[]): number[] {
141
+ if (arr.length <= 1) {
142
+ return arr;
143
+ }
144
+ const pivot = arr[0];
145
+ const left = [];
146
+ const right = [];
147
+ <|fim▁hole|>
148
+ return [...quickSort(left), pivot, ...quickSort(right)];
149
+ }<|fim▁end|>"""
150
+
151
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
152
+ outputs = model.generate(**inputs, max_length=256)
153
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
154
+ ```
155
+
156
+ ### Running with Ollama
157
+ **Model:** https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript
158
+
159
+ ```ollama run codegpt/deepseek-coder-1.3b-typescript```
160
+
161
+ ### Running with Ollama and CodeGPT Autocomplete in VSCode
162
+
163
+ **Documentation:** https://docs.codegpt.co/docs/tutorial-features/code_autocompletion
164
+
165
+ Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector.
166
+
167
+ Then, write any code or comment in the vscode text editor, and the model will provide you with code suggestions through the CodeGPT code autocomplete.
168
+
169
+ <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="ollama_autocomplete_codegpt.gif">
170
+
171
+ ### Fill In the Middle (FIM)
172
+ ```python
173
+ <|fim▁begin|>function quickSort(arr: number[]): number[] {
174
+ if (arr.length <= 1) {
175
+ return arr;
176
+ }
177
+ const pivot = arr[0];
178
+ const left = [];
179
+ const right = [];
180
+ <|fim▁hole|>
181
+ return [...quickSort(left), pivot, ...quickSort(right)];
182
+ }<|fim▁end|>
183
+ ```
184
+
185
+ ## Training procedure
186
+
187
+ ### Training hyperparameters
188
+
189
+ The following hyperparameters were used during training:
190
+ - learning_rate: 2e-05
191
+ - train_batch_size: 20
192
+ - eval_batch_size: 20
193
+ - seed: 42
194
+ - gradient_accumulation_steps: 2
195
+ - total_train_batch_size: 40
196
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
197
+ - lr_scheduler_type: cosine
198
+ - lr_scheduler_warmup_steps: 261
199
+ - num_epochs: 1
200
+
201
+ ### Training results
202
+
203
+ | Training Loss | Epoch | Step | Validation Loss |
204
+ |:-------------:|:-----:|:-----:|:---------------:|
205
+ | 1.0745 | 0.0 | 1 | 0.8681 |
206
+ | 1.2267 | 0.05 | 1308 | 0.8130 |
207
+ | 1.1594 | 0.1 | 2616 | 0.8018 |
208
+ | 0.7674 | 0.15 | 3924 | 0.7942 |
209
+ | 0.6443 | 0.2 | 5232 | 0.7889 |
210
+ | 0.9155 | 0.25 | 6540 | 0.7847 |
211
+ | 0.7501 | 0.3 | 7848 | 0.7819 |
212
+ | 0.8835 | 0.35 | 9156 | 0.7792 |
213
+ | 0.7261 | 0.4 | 10464 | 0.7769 |
214
+ | 0.9746 | 0.45 | 11772 | 0.7748 |
215
+ | 0.6884 | 0.5 | 13080 | 0.7734 |
216
+ | 0.6104 | 0.55 | 14388 | 0.7722 |
217
+ | 0.8876 | 0.6 | 15696 | 0.7710 |
218
+ | 0.9567 | 0.65 | 17004 | 0.7703 |
219
+ | 0.6915 | 0.7 | 18312 | 0.7696 |
220
+ | 0.8874 | 0.75 | 19620 | 0.7691 |
221
+ | 0.6124 | 0.8 | 20928 | 0.7686 |
222
+ | 0.8147 | 0.85 | 22236 | 0.7684 |
223
+ | 0.8021 | 0.9 | 23544 | 0.7683 |
224
+ | 0.8665 | 0.95 | 24852 | 0.7681 |
225
+
226
+
227
+ ### Framework versions
228
+
229
+ - Transformers 4.37.0.dev0
230
+ - Pytorch 2.0.1+cu118
231
+ - Datasets 2.16.1
232
+ - Tokenizers 0.15.0