Synthesized Dataset

#2
by Tottowich - opened

Hi, great job on this project!
Is the synthesized dataset of 200k examples going to be open-sourced?
Looking to train a similar model with Llama3 as base instead.

MotherDuck org

We released a 25k subset. Hope that helps :) https://huggingface.co/datasets/motherduckdb/duckdb-text2sql-25k

Hello there, have you also some database with query-answer where we could train the model on ? I face an issue founding only sqlite database for text2sql training dataset.

Sign up or log in to comment