Create README.md

9cb7bcf verified 9 months ago

7.13 kB

	---
	license: other
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	tags:
	- axolotl
	- generated_from_trainer
	model-index:
	- name: deepseek-coder-1.3b-typescript
	results: []
	datasets:
	- bigcode/the-stack-dedup
	widget:
	- text: "class Person {\n constructor(public name:"
	example_title: "class"
	- text: "function quickSort"
	example_title: "function"
	---

	<p align="center">
	<img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="codegpt-deepseek-typescript.png?raw=true">
	</p>
	<p align="center"><a href="https://codegpt.co/">[CodeGPT.co]</a> \| <a href="https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript">[🦙 Ollama]</a> \| <a href="https://discord.gg/fKyyJX5pne">[Discord]</a> \| <a href="https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt">[VSCode Extension]</a> </p>
	<hr>

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.3.0`
	```yaml
	base_model: deepseek-ai/deepseek-coder-1.3b-base
	model_type: AutoModelForCausalLM
	trust_remote_code: true
	load_in_8bit: false
	load_in_4bit: false
	strict: false


	datasets:
	- path: CodeGPTPlus/typescript-0-500000-seq1024
	type: completion
	field: text


	val_set_size: 0.001
	output_dir: ./fft-out

	sequence_len: 1024

	adapter:
	lora_model_dir:
	lora_r:
	lora_alpha:
	lora_dropout:
	lora_target_linear:
	lora_fan_in_fan_out:
	lora_modules_to_save:

	wandb_project: deepseek_1.3_fft
	wandb_entity:
	wandb_watch:
	wandb_name: aws_a10g
	wandb_log_model: end


	gradient_accumulation_steps: 2
	micro_batch_size: 20
	num_epochs: 1
	optimizer: adamw_bnb_8bit
	adam_beta1: 0.9
	adam_beta2: 0.999
	adam_epsilon: 0.000001
	max_grad_norm: 1.0
	weight_decay: 0.1
	lr_scheduler: cosine
	learning_rate: 0.00002
	train_on_inputs: false
	group_by_length: false
	bf16: true
	fp16: false
	tf32: false
	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	loss_watchdog_threshold: 5.0
	loss_watchdog_patience: 3

	hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript
	hub_strategy: every_save
	warmup_ratio: 0.01
	evals_per_epoch: 20
	saves_per_epoch: 3
	debug:
	deepspeed:

	fsdp:
	fsdp_config:
	special_tokens:
	bos_token: "<｜begin▁of▁sentence｜>"
	eos_token: "<｜end▁of▁sentence｜>"
	pad_token: "<｜end▁of▁sentence｜>"
	```

	</details><br>

	# deepseek-coder-1.3b-typescript

	CodeGPTPlus/deepseek-coder-1.3b-typescript, emerges as a fine-tuned iteration of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base), meticulously crafted by the CodeGPT team to excel in generating expert code in TypeScript. With specific fine-tuning for TypeScript and a dataset of 0.5B tokens, this model excels in producing precise and efficient solutions in this programming language.

	The 16K window size and an additional fill-in-the-middle task are employed to deliver project-level code completion.

	This new model stands as the ideal choice for those seeking a specialized code generator for TypeScript, backed by the expertise of the CodeGPT team.

	It achieves the following results on the evaluation set:
	- Loss: 0.7681

	Model Developers CodeGPT Team

	Variations 1.3B

	Input Models input text only.

	Output Models generate text only.

	## How to Use
	This model is for completion purposes only. Here give some examples of how to use the model.

	#### Running the model on a GPU
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript",
	trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript",
	trust_remote_code=True).cuda()

	input_text = """<｜fim▁begin｜>function quickSort(arr: number[]): number[] {
	if (arr.length <= 1) {
	return arr;
	}
	const pivot = arr[0];
	const left = [];
	const right = [];
	<｜fim▁hole｜>
	return [...quickSort(left), pivot, ...quickSort(right)];
	}<｜fim▁end｜>"""

	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_length=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Running with Ollama
	Model: https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript

	```ollama run codegpt/deepseek-coder-1.3b-typescript```

	### Running with Ollama and CodeGPT Autocomplete in VSCode

	Documentation: https://docs.codegpt.co/docs/tutorial-features/code_autocompletion

	Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector.

	Then, write any code or comment in the vscode text editor, and the model will provide you with code suggestions through the CodeGPT code autocomplete.

	<img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="ollama_autocomplete_codegpt.gif">

	### Fill In the Middle (FIM)
	```python
	<｜fim▁begin｜>function quickSort(arr: number[]): number[] {
	if (arr.length <= 1) {
	return arr;
	}
	const pivot = arr[0];
	const left = [];
	const right = [];
	<｜fim▁hole｜>
	return [...quickSort(left), pivot, ...quickSort(right)];
	}<｜fim▁end｜>
	```

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 20
	- eval_batch_size: 20
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 40
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 261
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.0745 \| 0.0 \| 1 \| 0.8681 \|
	\| 1.2267 \| 0.05 \| 1308 \| 0.8130 \|
	\| 1.1594 \| 0.1 \| 2616 \| 0.8018 \|
	\| 0.7674 \| 0.15 \| 3924 \| 0.7942 \|
	\| 0.6443 \| 0.2 \| 5232 \| 0.7889 \|
	\| 0.9155 \| 0.25 \| 6540 \| 0.7847 \|
	\| 0.7501 \| 0.3 \| 7848 \| 0.7819 \|
	\| 0.8835 \| 0.35 \| 9156 \| 0.7792 \|
	\| 0.7261 \| 0.4 \| 10464 \| 0.7769 \|
	\| 0.9746 \| 0.45 \| 11772 \| 0.7748 \|
	\| 0.6884 \| 0.5 \| 13080 \| 0.7734 \|
	\| 0.6104 \| 0.55 \| 14388 \| 0.7722 \|
	\| 0.8876 \| 0.6 \| 15696 \| 0.7710 \|
	\| 0.9567 \| 0.65 \| 17004 \| 0.7703 \|
	\| 0.6915 \| 0.7 \| 18312 \| 0.7696 \|
	\| 0.8874 \| 0.75 \| 19620 \| 0.7691 \|
	\| 0.6124 \| 0.8 \| 20928 \| 0.7686 \|
	\| 0.8147 \| 0.85 \| 22236 \| 0.7684 \|
	\| 0.8021 \| 0.9 \| 23544 \| 0.7683 \|
	\| 0.8665 \| 0.95 \| 24852 \| 0.7681 \|


	### Framework versions

	- Transformers 4.37.0.dev0
	- Pytorch 2.0.1+cu118
	- Datasets 2.16.1
	- Tokenizers 0.15.0