Model Card for deepseek-coder-33b-instruct-pythagora-v3

This model card describes the deepseek-coder-33b-instruct-pythagora version 3 model, which is a fine-tuned version of the DeepSeek Coder 33B Instruct model, specifically optimized for use with the Pythagora GPT Pilot application.

Model Details

Model Description

Model Sources

Uses

Direct Use

This model is intended for use with the Pythagora GPT Pilot application, which enables the creation of fully working, production-ready apps with the assistance of a developer. The model has been fine-tuned to work seamlessly with the GPT Pilot prompt structures and can be utilized through the Pythagora LLM Proxy.

The model is designed to generate code and assist with various programming tasks, such as writing features, debugging, and providing code reviews, all within the context of the Pythagora GPT Pilot application.

Out-of-Scope Use

This model should not be used for tasks outside of the intended use case with the Pythagora GPT Pilot application. It is not designed for standalone use or integration with other applications without proper testing and adaptation. Additionally, the model should not be used for generating content related to sensitive topics, such as politics, security, or privacy issues, as it is specifically trained to focus on computer science and programming-related tasks.

Bias, Risks, and Limitations

As with any language model, there may be biases present in the training data that could be reflected in the model's outputs. Users should be aware of potential limitations and biases when using this model. The model's performance may be impacted by the quality and relevance of the input prompts, as well as the specific programming languages and frameworks used in the context of the Pythagora GPT Pilot application.

Recommendations

Users should familiarize themselves with the Pythagora GPT Pilot application and its intended use cases before utilizing this model. It is recommended to use the model in conjunction with the Pythagora LLM Proxy for optimal performance and compatibility. When using the model, users should carefully review and test the generated code to ensure its correctness, efficiency, and adherence to best practices and project requirements.

How to Get Started with the Model

To use this model with the Pythagora GPT Pilot applicationand the Pythagora-LLM-Proxy:

  1. Set up the Pythagora LLM Proxy to work with your LLM host software (i.e. LM Studio) by following the instructions in the GitHub repository.
  2. Configure GPT Pilot to use the Pythagora LLM Proxy by setting the OpenAI API endpoint to http://localhost:8080/v1/chat/completions.
  3. Run GPT Pilot as usual, and the proxy will handle the communication between GPT Pilot and the LLM host software running the deepseek-coder-6.7b-instruct-pythagora model.
  4. It is possible to run Pythagora directly to LM Studio or any other service but be cautious of the 16,384 token limitations as exceeding the limit will result in an endless loop of "invalid json" responses.

For more detailed instructions and examples, please refer to the Pythagora LLM Proxy README.

Training Details

Training Data

The model was fine-tuned using a custom dataset created from sample prompts generated by the Pythagora prompt structures. The prompts are compatible with the version described in the Pythagora README. The dataset was carefully curated to ensure high-quality examples and a diverse range of programming tasks relevant to the Pythagora GPT Pilot application.

Training Procedure

The model was fine-tuned using the training scripts and resources provided in the DeepSeek Coder GitHub repository. Specifically, the finetune/finetune_deepseekcoder.py script was used to perform the fine-tuning process. The model was trained using PEFT with a maximum sequence length of 9,000 tokens, utilizing the custom dataset to adapt the base DeepSeek Coder 33B Instruct model to the specific requirements and prompt structures of the Pythagora GPT Pilot application.

The training process leveraged state-of-the-art techniques and hardware, including DeepSpeed integration for efficient distributed training, to ensure optimal performance and compatibility with the target application. For detailed information on the training procedure, including the specific hyperparameters and configurations used, please refer to the DeepSeek Coder Fine-tuning Documentation.

Model Examination

No additional interpretability work has been performed on this model. However, the model's performance has been thoroughly tested and validated within the context of the Pythagora GPT Pilot application to ensure its effectiveness in generating high-quality code and assisting with programming tasks.

Environmental Impact

The environmental impact of this model has not been assessed. More information is needed to estimate the carbon emissions and electricity usage associated with the model's training and deployment. As a general recommendation, users should strive to utilize the model efficiently and responsibly to minimize any potential environmental impact.

Technical Specifications

  • Model Architecture: The model architecture is based on the DeepSeek Coder 33B Instruct model, which is a transformer-based causal language model optimized for code generation and understanding.
  • Compute Infrastructure: The model was fine-tuned using high-performance computing resources, including GPUs, to ensure efficient and timely training. The exact specifications of the compute infrastructure used for training are not publicly disclosed.

Citation

APA: LoupGarou. (2024). deepseek-coder-33b-instruct-pythagora-v3 (Model). https://huggingface.co/LoupGarou/deepseek-coder-33b-instruct-pythagora-v3

Model Card Contact

For questions, feedback, or concerns regarding this model, please contact LoupGarou through the GitHub repository: MoonlightByte/Pythagora-LLM-Proxy. You can open an issue or submit a pull request to discuss any aspects of the model or its usage within the Pythagora GPT Pilot application.

Original model card: DeepSeek's Deepseek Coder 33B Instruct

🏠Homepage | 🤖 Chat with DeepSeek Coder | Discord | Wechat(微信)


1. Introduction of Deepseek Coder

Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on project-level code corpus by employing a window size of 16K and a extra fill-in-the-blank task, to support project-level code completion and infilling. For coding capabilities, Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.

  • Massive Training Data: Trained from scratch fon 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
  • Highly Flexible & Scalable: Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
  • Superior Model Performance: State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
  • Advanced Code Completion Capabilities: A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.

2. Model Summary

deepseek-coder-33b-instruct is a 33B parameter model initialized from deepseek-coder-33b-base and fine-tuned on 2B tokens of instruction data.

3. How to Use

Here give some examples of how to use our model.

Chat Model Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct", trust_remote_code=True).cuda()
messages=[
    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
# 32021 is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

4. License

This code repository is licensed under the MIT License. The use of DeepSeek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.

See the LICENSE-MODEL for more details.

5. Contact

If you have any questions, please raise an issue or contact us at agi_code@deepseek.com.

Downloads last month
2,723
GGUF
Model size
33.3B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

Inference API
Unable to determine this model's library. Check the docs .