ddrg
/

codecolbert

Model card Files Files and versions Community

codecolbert / README.md

AnReu's picture

Create README

02ba4f5 over 1 year ago

|

807 Bytes

	# CodeColBERT

	This model serves as the base for our semantic code retrieval system SELMA. It can be applied for indexing and retrieval using the Pyterrier bindings for ColBERT.

	## Training Details
	This model was trained for code retrieval. As a base, CodeBERT is used. It is trained using the official ColBERTv2 code
	([Github](https://github.com/stanford-futuredata/ColBERT)).

	Our data source is the [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet).
	Training ColBERT requires a tripes of queries, positive examples and negative examples. As queries, we used the documentation
	provided for each sample in the CodeSearchNet data set, while its code snippet serves as the positive example. Negative examples were
	sampled randomly from the corpus. In total, we train for 400.000 steps.