ddrg
/

codecolbert

Model card Files Files and versions Community

File size: 807 Bytes

02ba4f5

# CodeColBERT

This model serves as the base for our semantic code retrieval system SELMA. It can be applied for indexing and retrieval using the Pyterrier bindings for ColBERT.

## Training Details
This model was trained for code retrieval. As a base, CodeBERT is used. It is trained using the official ColBERTv2 code 
([Github](https://github.com/stanford-futuredata/ColBERT)). 

Our data source is the [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet).
Training ColBERT requires a tripes of queries, positive examples and negative examples. As queries, we used the documentation 
provided for each sample in the CodeSearchNet data set, while its code snippet serves as the positive example. Negative examples were 
sampled randomly from the corpus. In total, we train for 400.000 steps.