Train this model on coding datasets to make a BEAST coding model

by rombodawg - opened Jul 20, 2023

Jul 20, 2023

I bet if you trained this model on one, multiple or even all of these coding datasets, it would be better than wizarscoder hands down. Not sure if it would be better to train llama-2-13b on these datasets first then train it on the guanaco qlora, or the other way around

https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1

https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k

https://huggingface.co/datasets/sahil2801/code_instructions_120k

https://huggingface.co/datasets/codeparrot/github-code-clean

https://huggingface.co/datasets/razent/wizardlm-code-evol-32k

rombodawg

Jul 21, 2023

•

edited Jul 21, 2023

Actually forget those datasets, i made my own if you want to use it. It would be much easier. I tried training the model myself but i kept getting errors and im not exprerienced enough to fix them 😞
Heres my dataset
https://huggingface.co/datasets/rombodawg/MegaCodeTraining112k

Mikael110

Owner Jul 21, 2023

Training such a large dataset is outside my budget. To put things into perspective the Guanaco dataset is only 20.9MB versus your 433MB dataset.

I can try to help you debug the training issue you are having though. What script are you using to train and what are the errors?

vtiyyal1

Jul 24, 2023

Actually forget those datasets, i made my own if you want to use it. It would be much easier. I tried training the model myself but i kept getting errors and im not exprerienced enough to fix them 😞
Heres my dataset
https://huggingface.co/datasets/rombodawg/MegaCodeTraining112k

Why not the above datasets?

rombodawg

Jul 24, 2023

•

edited Jul 24, 2023

Because the one i made is much cleaner, and consise, in inclides both the 80k and 32k combined. Doesnt include code instruct 120k so you can use that to train seperatly because its formatted diffrently

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment