tokenizer | model | datasets | plots | fine tuning
Tokenizer {#tokenizer}
We trained our tokenizer using sentencepiece's unigram tokenizer. Then loaded the tokenizer as MT5TokenizerFast.
Model {#model}
We used MT5-base model.
Datasets {#datasets}
We used Code Search Net's dataset and some scrapped data from internet to train the model. We maintained a list of datasets where each dataset had codes of same language.
Plots {#plots}
train loss | evaluation loss | evaluation accuracy | learning rate
Train loss {#train_loss}
Evaluation loss {#eval_loss}
Evaluation accuracy {#eval_acc}
Learning rate {#lrs}
Fine tuning {#fine-tuning}
We fine tuned the model with CodeXGLUE code-to-code-trans dataset, and scrapper data.