Add README

Files changed (7) hide show

.vscode/settings.json ADDED Viewed

+{
+	"python.pythonPath": "/home/zarzis/anaconda3/envs/p3.8-code/bin/python"
+}

README.md ADDED Viewed

+[tokenizer](#tokenizer) | [model](#model) | [datasets](#datasets) | [plots](#plots) | [fine tuning](#fine-tuning)
+# Tokenizer {#tokenizer}
+We trained our tokenizer using [sentencepiece](https://github.com/google/sentencepiece)'s unigram tokenizer. Then loaded the tokenizer as MT5TokenizerFast.
+## Model {#model}
+We used [MT5-base](https://huggingface.co/google/mt5-base) model.
+## Datasets {#datasets}
+We used [Code Search Net](https://huggingface.co/datasets/code_search_net)'s dataset and some scrapped data from internet to train the model. We maintained a list of datasets where each dataset had codes of same language.
+## Plots {#plots}
+[train loss](#train_loss) | [evaluation loss](#eval_loss) | [evaluation accuracy](#eval_acc) | [learning rate](#lrs)
+### Train loss {#train_loss}
+![train loss](train_loss.png)
+### Evaluation loss {#eval_loss}
+![eval loss](eval_loss.png)
+### Evaluation accuracy {#eval_acc}
+![eval accuracy](eval_accuracy.png)
+### Learning rate {#lrs}
+![learning rate](learning_rate.png)
+## Fine tuning {#fine-tuning}
+We fine tuned the model with [CodeXGLUE code-to-code-trans dataset](https://huggingface.co/datasets/code_x_glue_cc_code_to_code_trans), and scrapper data.

eval_accuracy.png ADDED Viewed

eval_loss.png ADDED Viewed

learning_rate.png ADDED Viewed

log_eval.py CHANGED Viewed

@@ -1,7 +1,7 @@
 # To add a new cell, type '# %%'
 # To add a new markdown cell, type '# %% [markdown]'
 # %%
-from IPython import get_ipython
 # %%
 # get_ipython().system("ls -l ../logs")
@@ -21,7 +21,7 @@ eval_accs = []
 learning_rate = []
 with open(path, "r") as filePtr:
     for line in filePtr:
-        print(line)
         toks = line.split()
         if toks[0] == "Step...":
             if "Learning" in toks:
@@ -89,4 +89,3 @@ plt.show()
 # %%

 # To add a new cell, type '# %%'
 # To add a new markdown cell, type '# %% [markdown]'
 # %%
+# from IPython import get_ipython
 # %%
 # get_ipython().system("ls -l ../logs")
 learning_rate = []
 with open(path, "r") as filePtr:
     for line in filePtr:
+        # print(line)
         toks = line.split()
         if toks[0] == "Step...":
             if "Learning" in toks:
 # %%

train_loss.png ADDED Viewed