ddrg
/

AnReu commited on
Commit
02ba4f5
·
1 Parent(s): 063da31

Create README

Browse files
Files changed (1) hide show
  1. README.md +12 -0
README.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CodeColBERT
2
+
3
+ This model serves as the base for our semantic code retrieval system SELMA. It can be applied for indexing and retrieval using the Pyterrier bindings for ColBERT.
4
+
5
+ ## Training Details
6
+ This model was trained for code retrieval. As a base, CodeBERT is used. It is trained using the official ColBERTv2 code
7
+ ([Github](https://github.com/stanford-futuredata/ColBERT)).
8
+
9
+ Our data source is the [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet).
10
+ Training ColBERT requires a tripes of queries, positive examples and negative examples. As queries, we used the documentation
11
+ provided for each sample in the CodeSearchNet data set, while its code snippet serves as the positive example. Negative examples were
12
+ sampled randomly from the corpus. In total, we train for 400.000 steps.