|
This folder contains the suite for evaluating the DuckDB-Text2SQL model. |
|
|
|
Please install the dependencies listed in the requirements.txt file located in the parent folder. |
|
|
|
## Setup |
|
To evaluate against the benchmark dataset, you need to prepare the evaluation script using this benchmark. |
|
|
|
``` |
|
mkdir metrics |
|
cd metrics |
|
git clone git@github.com:ElementAI/test-suite-sql-eval.git test_suite_sql_eval |
|
cd .. |
|
``` |
|
|
|
You need to add a new remote to evaluate against duckdb in the test-suite-sql-eval folder. And check the latest duckdb-only branch (640a12975abf75a94e917caca149d56dbc6bcdd7). |
|
|
|
``` |
|
git remote add till https://github.com/tdoehmen/test-suite-sql-eval.git |
|
git fetch till |
|
git checkout till/duckdb-only |
|
``` |
|
|
|
Next, prepare the docs for retrieval. |
|
``` |
|
mkdir docs |
|
cd docs |
|
git clone https://github.com/duckdb/duckdb-web.git |
|
cd .. |
|
``` |
|
|
|
#### Dataset |
|
The benchmark dataset is located in the `data/` folder and includes all databases (`data/databases`), table schemas (`data/tables.json`), and examples (`data/dev.json`). |
|
|
|
#### Eval |
|
Start a manifest session with the model you want to evaluate. |
|
|
|
```bash |
|
python -m manifest.api.app \ |
|
--model_type huggingface \ |
|
--model_generation_type text-generation \ |
|
--model_name_or_path motherduckdb/DuckDB-NSQL-7B-v0.1 \ |
|
--fp16 \ |
|
--device 0 |
|
``` |
|
|
|
Then, from the `DuckDB-NSQL` main folder, run: |
|
|
|
```bash |
|
python eval/predict.py \ |
|
predict \ |
|
eval/data/dev.json \ |
|
eval/data/tables.json \ |
|
--output-dir output/ \ |
|
--stop-tokens ';' \ |
|
--stop-tokens '--' \ |
|
--stop-tokens '```' \ |
|
--stop-tokens '###' \ |
|
--overwrite-manifest \ |
|
--manifest-client huggingface \ |
|
--manifest-connection http://localhost:5000 \ |
|
--prompt-format duckdbinst |
|
``` |
|
This will format the prompt using the duckdbinst style. |
|
|
|
To evaluate the prediction, first run the following in a Python shell: |
|
|
|
```python |
|
try: |
|
import duckdb |
|
|
|
con = duckdb.connect() |
|
con.install_extension("httpfs") |
|
con.load_extension("httpfs") |
|
except Exception as e: |
|
print(f"Error loading duckdb extensions: {e}") |
|
``` |
|
|
|
Then, run the evaluation script: |
|
|
|
```bash |
|
python eval/evaluate.py \ |
|
evaluate \ |
|
--gold eval/data/dev.json \ |
|
--db eval/data/databases/ \ |
|
--tables eval/data/tables.json \ |
|
--output-dir output/ \ |
|
--pred [PREDICITON_FILE] |
|
``` |
|
|
|
To view the output, all the information is located in the prediction file in the [output-dir]. Here, `query` is gold and `pred` is predicted. |
|
|