Create README
Browse files
README.md
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CodeColBERT
|
2 |
+
|
3 |
+
This model serves as the base for our semantic code retrieval system SELMA. It can be applied for indexing and retrieval using the Pyterrier bindings for ColBERT.
|
4 |
+
|
5 |
+
## Training Details
|
6 |
+
This model was trained for code retrieval. As a base, CodeBERT is used. It is trained using the official ColBERTv2 code
|
7 |
+
([Github](https://github.com/stanford-futuredata/ColBERT)).
|
8 |
+
|
9 |
+
Our data source is the [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet).
|
10 |
+
Training ColBERT requires a tripes of queries, positive examples and negative examples. As queries, we used the documentation
|
11 |
+
provided for each sample in the CodeSearchNet data set, while its code snippet serves as the positive example. Negative examples were
|
12 |
+
sampled randomly from the corpus. In total, we train for 400.000 steps.
|