File size: 2,877 Bytes

608d434
 
fc51336
 
15cd0e2
fc51336
b83a16b
fc51336
 
71c65ef
15cd0e2
 
 
6840889
051e153
15cd0e2
 
0adad45
 
 
 
 
 
 
 
12408c7
0adad45
 
 
12408c7
0adad45
15cd0e2
 
 
 
 
 
12408c7
 
 
 
15cd0e2
 
 
 
12408c7
 
 
 
15cd0e2
 
 
 
12408c7
 
 
 
15cd0e2
 
 
 
608d434
fc51336
fda6687
fc51336
 
 
8b4238c
fc51336
 
 
 
 
 
 
 
 
15cd0e2
fe38445
15cd0e2
fc51336
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45fcf9f
 
fc51336
45fcf9f
fc51336
45fcf9f
fc51336
 
 
45fcf9f
fc51336
 
 
45fcf9f
 
 
 
fc51336

---
license: apache-2.0
datasets:
- lambdasec/cve-single-line-fixes
- lambdasec/gh-top-1000-projects-vulns
language:
- code
tags:
- code
programming_language:
- Java
- JavaScript
- Python
inference: false
model-index:
- name: SantaFixer
  results:
  - task:
      type: text-generation
    dataset:
      type: openai/human-eval-infilling
      name: HumanEval
    metrics:
    - name: single-line infilling pass@1
      type: pass@1
      value: 0.47
      verified: false
    - name: single-line infilling pass@10
      type: pass@10
      value: 0.73
      verified: false
  - task:
      type: text-generation
    dataset:
      type: lambdasec/gh-top-1000-projects-vulns
      name: GH Top 1000 Projects Vulnerabilities
    metrics:
    - name: pass@1 (Java)
      type: pass@1
      value: 0.1
      verified: false
    - name: pass@10 (Java)
      type: pass@10
      value: 0.1
      verified: false
    - name: pass@1 (Python)
      type: pass@1
      value: 0.2
      verified: false
    - name: pass@10 (Python)
      type: pass@10
      value: 0.2
      verified: false
    - name: pass@1 (JavaScript)
      type: pass@1
      value: 0.3
      verified: false
    - name: pass@10 (JavaScript)
      type: pass@10
      value: 0.3
      verified: false
---

# Model Card for SantaFixer

<!-- Provide a quick summary of what the model is/does. -->

This is a LLM for code that is focussed on generating bug fixes using infilling. 

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** [codelion](https://huggingface.co/codelion)
- **Model type:** GPT-2
- **Finetuned from model:** [bigcode/santacoder](https://huggingface.co/bigcode/santacoder)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]


## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

## Training Details

- **GPU:** Tesla P100
- **Time:** ~5 hrs

### Training Data

The model was fine-tuned on the [CVE single line fixes dataset](https://huggingface.co/datasets/lambdasec/cve-single-line-fixes)

### Training Procedure 

Supervised Fine Tuning (SFT)

#### Training Hyperparameters

- **optim:** adafactor
- **gradient_accumulation_steps:** 4
- **gradient_checkpointing:** true
- **fp16:** false

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Data Card if possible. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary

[More Information Needed]