Request: Would the amazon team be willing to train a model on my high quality dataset?

#5
by rombodawg - opened

I have created and refined open source dataset named "LosslessmegacodeV3" (Linked at the end) which I believe could create one of the best open source ai models if trained on the right base model. However seeing as I am severally lacking in funding (Aka im broke af) I haven't been able to do the training myself. I'm curious if your team would be willing to take on the challenge of training one ai model on my dataset for coding and non-coding tasks (the dataset is made for both) to create possibly one of the best ai models available. If you are up for the challenge, here is a list of the top models I would recommend training with my dataset in order or highest priority (Note that I would only ask you to train 1 model, I am merely giving multiple options):

1: WizardLM/WizardCoder-Python-13B-V1.0

2: amazon/MistralLite

3: WizardLM/WizardCoder-Python-34B-V1.0 (#3 and #4 are equal in priority)

4: Phind/Phind-CodeLlama-34B-v2 (#3 and #4 are equal in priority)

5: jondurbin/airoboros-l2-c70b-3.1.2


If you agree I have some names for the model that would release if you would allow me. I've listed them bellow. Lossless and V3 referring to the datasets That were used to train the models on.

1: LosslessWizardCoder-Python-13B-V3

2: LosslessMistralLitecoderV3

3: LosslessWizardCoder-Python-34B-V3

4: LoesslessPhind-LlamaCoder-34B-V3

5: LosslessAiroborosCoder-l2-c70b-V3


Dataset link:

Amazon Web Services org

Hi @rombodawg Thanks for the advice!

Unfortunately, we have our internal process to work on this topic.

Based on your description, codewhisperer might be sth you are interested. Please have a try :)

yinsong1986 changed discussion status to closed

Sign up or log in to comment