Tokenizer
We trained our tokenizer using sentencepiece's unigram tokenizer. Then loaded the tokenizer as MT5TokenizerFast.
Model
We used MT5-base model.
Datasets
We used Code Search Net's dataset and some scrapped data from internet to train the model. We maintained a list of datasets where each dataset had codes of same language.
Plots
Train loss
Evaluation loss
Evaluation accuracy
Learning rate
Fine tuning (WIP)
We fine tuned the model with CodeXGLUE code-to-code-trans dataset, and scrapper data.