|
--- |
|
license: other |
|
base_model: deepseek-ai/deepseek-coder-1.3b-base |
|
tags: |
|
- axolotl |
|
- generated_from_trainer |
|
model-index: |
|
- name: deepseek-coder-1.3b-typescript |
|
results: [] |
|
datasets: |
|
- bigcode/the-stack-dedup |
|
widget: |
|
- text: "class Person {\n constructor(public name:" |
|
example_title: "class" |
|
- text: "function quickSort" |
|
example_title: "function" |
|
--- |
|
|
|
<p align="center"> |
|
<img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="codegpt-deepseek-typescript.png?raw=true"> |
|
</p> |
|
<p align="center"><a href="https://codegpt.co/">[CodeGPT.co]</a> | <a href="https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript">[🦙 Ollama]</a> | <a href="https://discord.gg/fKyyJX5pne">[Discord]</a> | <a href="https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt">[VSCode Extension]</a> </p> |
|
<hr> |
|
|
|
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.3.0` |
|
```yaml |
|
base_model: deepseek-ai/deepseek-coder-1.3b-base |
|
model_type: AutoModelForCausalLM |
|
trust_remote_code: true |
|
load_in_8bit: false |
|
load_in_4bit: false |
|
strict: false |
|
|
|
|
|
datasets: |
|
- path: CodeGPTPlus/typescript-0-500000-seq1024 |
|
type: completion |
|
field: text |
|
|
|
|
|
val_set_size: 0.001 |
|
output_dir: ./fft-out |
|
|
|
sequence_len: 1024 |
|
|
|
adapter: |
|
lora_model_dir: |
|
lora_r: |
|
lora_alpha: |
|
lora_dropout: |
|
lora_target_linear: |
|
lora_fan_in_fan_out: |
|
lora_modules_to_save: |
|
|
|
wandb_project: deepseek_1.3_fft |
|
wandb_entity: |
|
wandb_watch: |
|
wandb_name: aws_a10g |
|
wandb_log_model: end |
|
|
|
|
|
gradient_accumulation_steps: 2 |
|
micro_batch_size: 20 |
|
num_epochs: 1 |
|
optimizer: adamw_bnb_8bit |
|
adam_beta1: 0.9 |
|
adam_beta2: 0.999 |
|
adam_epsilon: 0.000001 |
|
max_grad_norm: 1.0 |
|
weight_decay: 0.1 |
|
lr_scheduler: cosine |
|
learning_rate: 0.00002 |
|
train_on_inputs: false |
|
group_by_length: false |
|
bf16: true |
|
fp16: false |
|
tf32: false |
|
gradient_checkpointing: true |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: |
|
flash_attention: true |
|
|
|
loss_watchdog_threshold: 5.0 |
|
loss_watchdog_patience: 3 |
|
|
|
hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript |
|
hub_strategy: every_save |
|
warmup_ratio: 0.01 |
|
evals_per_epoch: 20 |
|
saves_per_epoch: 3 |
|
debug: |
|
deepspeed: |
|
|
|
fsdp: |
|
fsdp_config: |
|
special_tokens: |
|
bos_token: "<|begin▁of▁sentence|>" |
|
eos_token: "<|end▁of▁sentence|>" |
|
pad_token: "<|end▁of▁sentence|>" |
|
``` |
|
|
|
</details><br> |
|
|
|
# deepseek-coder-1.3b-typescript |
|
|
|
CodeGPTPlus/deepseek-coder-1.3b-typescript, emerges as a fine-tuned iteration of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base), meticulously crafted by the CodeGPT team to excel in generating expert code in TypeScript. With specific fine-tuning for TypeScript and a dataset of 0.5B tokens, this model excels in producing precise and efficient solutions in this programming language. |
|
|
|
The 16K window size and an additional fill-in-the-middle task are employed to deliver project-level code completion. |
|
|
|
This new model stands as the ideal choice for those seeking a specialized code generator for TypeScript, backed by the expertise of the CodeGPT team. |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.7681 |
|
|
|
**Model Developers** CodeGPT Team |
|
|
|
**Variations** 1.3B |
|
|
|
**Input** Models input text only. |
|
|
|
**Output** Models generate text only. |
|
|
|
## How to Use |
|
This model is for completion purposes only. Here give some examples of how to use the model. |
|
|
|
#### Running the model on a GPU |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", |
|
trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", |
|
trust_remote_code=True).cuda() |
|
|
|
input_text = """<|fim▁begin|>function quickSort(arr: number[]): number[] { |
|
if (arr.length <= 1) { |
|
return arr; |
|
} |
|
const pivot = arr[0]; |
|
const left = []; |
|
const right = []; |
|
<|fim▁hole|> |
|
return [...quickSort(left), pivot, ...quickSort(right)]; |
|
}<|fim▁end|>""" |
|
|
|
inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
|
outputs = model.generate(**inputs, max_length=256) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
### Running with Ollama |
|
**Model:** https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript |
|
|
|
```ollama run codegpt/deepseek-coder-1.3b-typescript``` |
|
|
|
### Running with Ollama and CodeGPT Autocomplete in VSCode |
|
|
|
**Documentation:** https://docs.codegpt.co/docs/tutorial-features/code_autocompletion |
|
|
|
Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector. |
|
|
|
Then, write any code or comment in the vscode text editor, and the model will provide you with code suggestions through the CodeGPT code autocomplete. |
|
|
|
<img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="ollama_autocomplete_codegpt.gif"> |
|
|
|
### Fill In the Middle (FIM) |
|
```python |
|
<|fim▁begin|>function quickSort(arr: number[]): number[] { |
|
if (arr.length <= 1) { |
|
return arr; |
|
} |
|
const pivot = arr[0]; |
|
const left = []; |
|
const right = []; |
|
<|fim▁hole|> |
|
return [...quickSort(left), pivot, ...quickSort(right)]; |
|
}<|fim▁end|> |
|
``` |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 2e-05 |
|
- train_batch_size: 20 |
|
- eval_batch_size: 20 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 2 |
|
- total_train_batch_size: 40 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 261 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:-----:|:---------------:| |
|
| 1.0745 | 0.0 | 1 | 0.8681 | |
|
| 1.2267 | 0.05 | 1308 | 0.8130 | |
|
| 1.1594 | 0.1 | 2616 | 0.8018 | |
|
| 0.7674 | 0.15 | 3924 | 0.7942 | |
|
| 0.6443 | 0.2 | 5232 | 0.7889 | |
|
| 0.9155 | 0.25 | 6540 | 0.7847 | |
|
| 0.7501 | 0.3 | 7848 | 0.7819 | |
|
| 0.8835 | 0.35 | 9156 | 0.7792 | |
|
| 0.7261 | 0.4 | 10464 | 0.7769 | |
|
| 0.9746 | 0.45 | 11772 | 0.7748 | |
|
| 0.6884 | 0.5 | 13080 | 0.7734 | |
|
| 0.6104 | 0.55 | 14388 | 0.7722 | |
|
| 0.8876 | 0.6 | 15696 | 0.7710 | |
|
| 0.9567 | 0.65 | 17004 | 0.7703 | |
|
| 0.6915 | 0.7 | 18312 | 0.7696 | |
|
| 0.8874 | 0.75 | 19620 | 0.7691 | |
|
| 0.6124 | 0.8 | 20928 | 0.7686 | |
|
| 0.8147 | 0.85 | 22236 | 0.7684 | |
|
| 0.8021 | 0.9 | 23544 | 0.7683 | |
|
| 0.8665 | 0.95 | 24852 | 0.7681 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.37.0.dev0 |
|
- Pytorch 2.0.1+cu118 |
|
- Datasets 2.16.1 |
|
- Tokenizers 0.15.0 |