tdoehmen commited on
Commit
e9713ec
·
1 Parent(s): ea4a3ce

added test suite

Browse files
duckdb-nsql/eval/metrics/test_suite_sql_eval DELETED
@@ -1 +0,0 @@
1
- Subproject commit 640a12975abf75a94e917caca149d56dbc6bcdd7
 
 
duckdb-nsql/eval/metrics/test_suite_sql_eval/README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Semantic Evaluation for Text-to-SQL with Test Suites
2
+
3
+ This repo contains test suite evaluation metric for 11 text-to-SQL tasks. Compared to other current metrics, test suite calculates a tighter upper-bound for semantic accuracy efficiently. It is proposed in our EMNLP 2020 paper: [Semantic Evaluation for Text-to-SQL with Distilled Test Suites](https://arxiv.org/abs/2010.02840). It is now the official metric of [Spider](https://yale-lily.github.io/spider), [SParC](https://yale-lily.github.io/sparc), and [CoSQL](https://yale-lily.github.io/cosql), and is also now available for Academic, ATIS, Advising, Geography, IMDB, Restaurants, Scholar, and Yelp (building on the amazing work by [Catherine and Jonathan](https://github.com/jkkummerfeld/text2sql-data)).
4
+
5
+ Notice: Please refer to [Ruiqi's repo](https://github.com/ruiqi-zhong/TestSuiteEval) for the code to generate neighbor queries and random databases as defined in the paper. We look forward to similar evaluations in other semantic parsing domains.
6
+
7
+
8
+ ## Setting Up
9
+
10
+ To run the test suite (execution) evaluation, first download the test suites (databases) for the 11 text-to-SQL tasks from [here](https://drive.google.com/file/d/1mkCx2GOFIqNesD4y8TDAO1yX1QZORP5w/view?usp=sharing), and put them in `database/` directory.
11
+
12
+ You also need to install sqlparse and nltk to run the evaluation.
13
+
14
+ ```
15
+ pip3 install sqlparse
16
+ pip3 install nltk
17
+ ```
18
+
19
+ ## Official Evaluation for Spider, SParC, and CoSQL
20
+
21
+ We will report the test suite accuracy for the official [Spider](https://yale-lily.github.io/spider), [SParC](https://yale-lily.github.io/sparc), and [CoSQL](https://yale-lily.github.io/cosql) leaderboards (starting Oct. 2020). The original exact set match accuracy will be reported as a reference.
22
+
23
+ Below is the example command to calculate the test suite accuracy for development sets of Spider, CoSQL and SParC.
24
+
25
+ ```
26
+ python3 evaluation.py --gold [gold file] --pred [predicted file] --etype [evaluation type] --db [database dir] --table [table file] --plug_value --keep_distinct --progress_bar_for_each_datapoint
27
+
28
+
29
+ arguments:
30
+ [gold file] gold file where each line is `a gold SQL \t db_id` for Spider, SParC, and CoSQL, and interactions are seperated by one empty line for SParC and CoSQL. See an example at evaluation_examples/gold.txt
31
+ [predicted file] predicted sql file where each line is a predicted SQL, and interactions are seperated by one empty line. See an example at evaluation_examples/predict.txt
32
+ [database dir] the directory that contains all the databases and test suites
33
+ [table file] table.json file which includes foreign key info of each database.
34
+ [evaluation type] "exec" for test suite accuracy (default), "match" for the original exact set match accuracy, and "all" for both
35
+ --plug_value whether to plug in the gold value into the predicted query; suitable if your model does not predict values.
36
+ --keep_distinct whether to keep distinct keyword during evaluation. default is false.
37
+ --progress_bar_for_each_datapoint whether to print progress bar of running test inputs for each datapoint
38
+ ```
39
+
40
+ #### Test Suite Execution Accuracy without Values
41
+ If your system does NOT predict values in the SQL queries, you should add the `--plug value` flag, which will extract the values used in the gold query and plug them into the predicted query.
42
+ ```
43
+ python3 evaluation.py
44
+ --gold [gold file]
45
+ --pred [predicted file]
46
+ --db [database dir]
47
+ --etype exec
48
+ --plug_value
49
+ ```
50
+ To also compute the original set match accuracy:
51
+ ```
52
+ python3 evaluation.py
53
+ --gold [gold file]
54
+ --pred [predicted file]
55
+ --db [database dir]
56
+ --table [table file]
57
+ --etype all
58
+ --plug_value
59
+ ```
60
+
61
+ #### Test Suite Execution Accuracy with Values
62
+ We encourage people to report performances with value predictions and do not include `--plug value` argument.
63
+ ```
64
+ python3 evaluation.py
65
+ --gold [gold file]
66
+ --pred [predicted file]
67
+ --db [database dir]
68
+ --etype exec
69
+ ```
70
+
71
+ #### Other Agruments
72
+ If `--keep_distinct` is included, the distinct keywords will NOT be removed during evaluation. To make a fair comparison with the original exact set match metric, `--keep_distinct` should not be added.
73
+
74
+ Include `--progress_bar_for_each_datapoint` if you suspect that the execution got stuck on a specific test input; it will print the progress of running on each test input.
75
+
76
+
77
+ ## Evaluation for Other Classical Text-to-SQL Datasets
78
+
79
+ *UPDATE:* we fixed the issue mentioned in https://github.com/taoyds/test-suite-sql-eval/issues/1 . We also added additional features to evaluate on a subset and cache the results to speed up evaluation.
80
+
81
+ The prior work on classical text-to-sql datasets (ATIS, Academic, Advising, Geography, IMDB, Restaurants, Scholar, Yelp) usually reports the exact string match accuracy and execution accuracy over a single database content, which either exaggerates or deflates the real semantic accuracy.
82
+
83
+ The test set for classical text-to-sql datasets are adopted from [this repo](https://github.com/jkkummerfeld/text2sql-data). We used all the test splits if the test split is defined, and the entire dataset otherwise. We also rewrite the SQLs to conform with the style in the Spider dataset.
84
+
85
+ All the test datapoints are saved in `classical_test.pkl`. Each test datapoint is represented as a dictonary have the following keys and values:
86
+
87
+ - `db_id`: which one of the eight original classical datasets does it belong to. database/[db_id]/[db_id].sqlite contains an empty database with the associated schema.
88
+ - `query`: the ground truth SQL query (or any semantically equivalent variant) the model needs to predict.
89
+ - `variables`: the constants that are used in the SQL query. We also include a field called `ancestor_of_occuring_column`, where we find out all the column that contains this value and recursively find its `ancestor column` (if a column refers to a parent column/has a foreign key reference). This field is especially useful if your algorithm originally uses database content to help generate model predictions.
90
+ - `testsuite`: a set of database paths on which we will compare denotation on
91
+ - `texts`: the associated natural language descriptions, with the constant value extracted.
92
+ - `orig_id`: the original data id from jonathan's repo. it is a tulple of two elements (db_id, idx) - referring to the idx^th element of the list encoded by text2sql-data/data/[db_id].json .
93
+
94
+ You can evaluate your model in whatever configurations you want. For example, you may choose to plug in the values into the text and ask the model itself to figure out which constants the user has given;
95
+ or you can relax the modelling assumption and assume the model has oracle access to the ground truth constant value; or you can further relax the assumption of knowing which "ancestor column" contains the constant provided.
96
+ However, in any case, you **SHOULD NOT** change the gold query, since test suite generation is dependent on it.
97
+
98
+ The `judge` function in evaluate_classical.py contains what you need to evaluate a single model prediction.
99
+ It takes in the ground truth information of a datapoint (an element in `classical_test.pkl`, represented as a dictionary) and a model prediction (as a string) and returns True/False - whether the prediction is semantically correct.
100
+
101
+ Suppose you have made a model prediction for every datapoint and write it into a `.txt` file (one prediction per line), you can use the following example command to calculate the accuracy:
102
+
103
+ ```
104
+ python3 evaluate_classical.py --gold [gold file] --pred [predicted file] --out_file [output file] --num_processes [process number]
105
+
106
+ arguments:
107
+ [gold file] path to gold file. The default is classical_test.pkl, and is hence this argument is optional.
108
+ [predicted file] the path to the predicted file. See an example evaluation_examples/classical_test_gold.txt
109
+ [output file] the output file path. e.g. goldclassicaltest.pkl
110
+ [process number] number of processes to use. By default, it is set to cpu_count() // 3, and is hence optional.
111
+ [subset] which subset to evaluate on. can be one of {atis,advising,academic,imdb,restaurants,geography,scholar,yelp,full}
112
+ [disable_cache] whether to directly apply previously computed result and cache the current results. Use this flag to disable caching.
113
+ ```
114
+
115
+ Here is an example command that evaluates the gold prediction file:
116
+
117
+ ```
118
+ python3 evaluate_classical.py --pred=evaluation_examples/classical_test_gold.txt --out_file=all_eval_results.json
119
+ ```
120
+
121
+ You can also choose to evaluate only on a subset of the datapoints, for example
122
+
123
+ ```
124
+ python3 evaluate_classical.py --pred=evaluation_examples/academic_gold.txt --subset=academic --out_file=out/out_academic_test.json
125
+ ```
126
+
127
+ By default, the evaluation script will save the results of evaluation in cache.pkl, and use it in the future (since these evaluation take a long time to run).
128
+ Use the ``disable_cache`` flag otherwise.
129
+
130
+ The process through which data are transformed can be seen in classical_provenance.ipynb.
131
+
132
+
133
+ ## Citation
134
+
135
+ ```
136
+ @InProceedings{ruiqi20,
137
+ author = {Ruiqi Zhong and Tao Yu and Dan Klein},
138
+ title = {Semantic Evaluation for Text-to-SQL with Distilled Test Suite},
139
+ year = {2020},
140
+ booktitle = {The 2020 Conference on Empirical Methods in Natural Language Processing},
141
+ publisher = {Association for Computational Linguistics},
142
+ }
143
+ ```
144
+
duckdb-nsql/eval/metrics/test_suite_sql_eval/__init__.py ADDED
File without changes
duckdb-nsql/eval/metrics/test_suite_sql_eval/alter_michigan_databases.sh ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #~/usr/bin/env bash
2
+ set -e
3
+
4
+ DATABASE_DIR=.
5
+
6
+ copy_databases () {
7
+ db=$1
8
+ # Copy to *_test directory
9
+ altered=$DATABASE_DIR/${db}_test
10
+ cp -r "$DATABASE_DIR/$db" "$altered"
11
+
12
+ # Rename .sqlite files
13
+ cd "$altered"
14
+ for f in ${db}*.sqlite
15
+ do
16
+ mv "$f" "${db}_test${f#${db}}"
17
+ done
18
+ cd -
19
+ }
20
+
21
+ alter_yelp () {
22
+ for f in `ls $DATABASE_DIR/yelp_test/*.sqlite`
23
+ do
24
+ echo "ALTER TABLE neighbourhood RENAME TO neighborhood" | sqlite3 "$f"
25
+ echo "ALTER TABLE neighborhood RENAME COLUMN neighbourhood_name TO neighborhood_name" | sqlite3 "$f"
26
+ done
27
+ }
28
+
29
+ alter_imdb () {
30
+ for f in `ls $DATABASE_DIR/imdb_test/*.sqlite`
31
+ do
32
+ echo "ALTER TABLE cast RENAME TO cast2" | sqlite3 "$f"
33
+ done
34
+ }
35
+
36
+ alter_academic () {
37
+ :
38
+ }
39
+
40
+ alter_geo () {
41
+ :
42
+ }
43
+
44
+ alter_scholar () {
45
+ :
46
+ }
47
+
48
+ # geo is an exception in that we want to change the name from "geography" to "geo_test"
49
+ # it is easiest to achieve this is by copying "geography" to "geo" first
50
+ if [ ! -d $DATABASE_DIR/geo ]
51
+ then
52
+ cp -r $DATABASE_DIR/geography $DATABASE_DIR/geo
53
+ mv $DATABASE_DIR/geo/geography.sqlite $DATABASE_DIR/geo/geo.sqlite
54
+ fi
55
+
56
+ for DB in imdb yelp academic geo scholar
57
+ do
58
+ echo $DB
59
+ if [ ! -d "$DATABASE_DIR/${DB}_test" ]
60
+ then
61
+ copy_databases $DB
62
+ alter_"$DB"
63
+ else
64
+ echo "$DATABASE_DIR/${DB}_test already exists"
65
+ fi
66
+ done
duckdb-nsql/eval/metrics/test_suite_sql_eval/classical_provenance.ipynb ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import json\n",
10
+ "import sqlparse\n",
11
+ "import pickle as pkl\n",
12
+ "dataset_names = ['academic', 'atis', 'advising', 'geography', 'imdb', 'restaurants', 'scholar', 'yelp']\n",
13
+ "\n",
14
+ "# these datasets are small, so we use the full set. \n",
15
+ "new_split_defined = {'restaurants', 'academic', 'imdb', 'yelp'} "
16
+ ]
17
+ },
18
+ {
19
+ "cell_type": "code",
20
+ "execution_count": 2,
21
+ "metadata": {},
22
+ "outputs": [],
23
+ "source": [
24
+ "# loading the original datasets from the paper:\n",
25
+ "# Improving Text-to-SQL Evaluation Methodology\n",
26
+ "\n",
27
+ "# a dataset is a list of dictionaries\n",
28
+ "# in the original dictionary, each datapoint might consist of several natural language sentences or SQL\n",
29
+ "orig_datasets = []\n",
30
+ "for dataset_name in dataset_names:\n",
31
+ " orig_dataset = json.load(open('text2sql-data/data/%s.json' % dataset_name))\n",
32
+ " for idx, d in enumerate(orig_dataset):\n",
33
+ " \n",
34
+ " d['orig_id'] = (dataset_name, idx)\n",
35
+ " \n",
36
+ " # fixing annotations here\n",
37
+ " \n",
38
+ " # change \"company_name\" to producer name, otherwise there is no variable to replace\n",
39
+ " if dataset_name == 'imdb' and idx == 27:\n",
40
+ " d['sql'][0] = 'SELECT MOVIEalias0.TITLE FROM COMPANY AS COMPANYalias0 , COPYRIGHT AS COPYRIGHTalias0 , MOVIE AS MOVIEalias0 WHERE COMPANYalias0.NAME = \"producer_name0\" AND COPYRIGHTalias0.CID = COMPANYalias0.ID AND MOVIEalias0.MID = COPYRIGHTalias0.MSID AND MOVIEalias0.RELEASE_YEAR > movie_release_year0 ;'\n",
41
+ " \n",
42
+ " # removing the extra space surrounding the variable actor_name0\n",
43
+ " if dataset_name == 'imdb' and idx == 78:\n",
44
+ " d['sql'][0] = 'SELECT MAX( DERIVED_TABLEalias0.DERIVED_FIELDalias0 ) FROM ( SELECT COUNT( DISTINCT ( MOVIEalias0.TITLE ) ) AS DERIVED_FIELDalias0 FROM ACTOR AS ACTORalias0 , CAST AS CASTalias0 , MOVIE AS MOVIEalias0 WHERE ACTORalias0.NAME = \"actor_name0\" AND CASTalias0.AID = ACTORalias0.AID AND MOVIEalias0.MID = CASTalias0.MSID GROUP BY MOVIEalias0.RELEASE_YEAR ) AS DERIVED_TABLEalias0 ;'\n",
45
+ " \n",
46
+ " # there was a scoping error; changed AUTHORalias1 to AUTHORalias0, PUBLICATIONalias1 to PUBLICATIONalias0\n",
47
+ " if dataset_name == 'academic' and idx == 182:\n",
48
+ " d['sql'][0] = 'SELECT DERIVED_FIELDalias0 FROM ( SELECT AUTHORalias0.NAME AS DERIVED_FIELDalias0 , COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) AS DERIVED_FIELDalias1 FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = \"conference_name0\" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ) AS DERIVED_TABLEalias0 , ( SELECT AUTHORalias1.NAME AS DERIVED_FIELDalias2 , COUNT( DISTINCT ( PUBLICATIONalias1.TITLE ) ) AS DERIVED_FIELDalias3 FROM AUTHOR AS AUTHORalias1 , CONFERENCE AS CONFERENCEalias1 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias1 WHERE CONFERENCEalias1.NAME = \"conference_name1\" AND PUBLICATIONalias1.CID = CONFERENCEalias1.CID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID GROUP BY AUTHORalias1.NAME ) AS DERIVED_TABLEalias1 WHERE DERIVED_TABLEalias0.DERIVED_FIELDalias1 > DERIVED_TABLEalias1.DERIVED_FIELDalias3 AND DERIVED_TABLEalias1.DERIVED_FIELDalias2 = DERIVED_TABLEalias0.DERIVED_FIELDalias0 ;'\n",
49
+ " \n",
50
+ " # wrong number of arguments to function COUNT(), change from \",\" to \"||\" for sqlite3 to recognize and execute\n",
51
+ " if dataset_name == 'advising' and idx == 107:\n",
52
+ " d['sql'][0] = 'SELECT COUNT( DISTINCT COURSEalias1.DEPARTMENT || COURSEalias0.NUMBER ) FROM COURSE AS COURSEalias0 , COURSE AS COURSEalias1 , COURSE_PREREQUISITE AS COURSE_PREREQUISITEalias0 , STUDENT_RECORD AS STUDENT_RECORDalias0 WHERE COURSEalias0.COURSE_ID = COURSE_PREREQUISITEalias0.PRE_COURSE_ID AND COURSEalias1.COURSE_ID = COURSE_PREREQUISITEalias0.COURSE_ID AND COURSEalias1.DEPARTMENT = \"department0\" AND COURSEalias1.NUMBER = number0 AND STUDENT_RECORDalias0.COURSE_ID = COURSEalias0.COURSE_ID AND STUDENT_RECORDalias0.STUDENT_ID = 1 ;'\n",
53
+ " \n",
54
+ " # there was not example given for level1 and hence replacing variable with values leads to errors\n",
55
+ " if dataset_name == 'advising' and idx == 132:\n",
56
+ " d['variables'][0]['example'] = '300'\n",
57
+ " \n",
58
+ " # cannot use count and order without group by; added grouping by actor_id\n",
59
+ " if dataset_name == 'imdb' and idx == 79:\n",
60
+ " d['sql'][0] = 'SELECT ACTORalias0.NAME FROM ACTOR AS ACTORalias0 , CAST AS CASTalias0 , MOVIE AS MOVIEalias0 WHERE CASTalias0.AID = ACTORalias0.AID AND MOVIEalias0.MID = CASTalias0.MSID GROUP BY ACTORalias0.AID ORDER BY COUNT( DISTINCT ( MOVIEalias0.TITLE ) ) DESC LIMIT 1 ;'\n",
61
+ " \n",
62
+ " # cannot use count and order without group by; added grouping by actor_id\n",
63
+ " if dataset_name == 'imdb' and idx == 80:\n",
64
+ " d['sql'][0] = 'SELECT ACTORalias0.NAME FROM ACTOR AS ACTORalias0 , CAST AS CASTalias0 , DIRECTED_BY AS DIRECTED_BYalias0 , DIRECTOR AS DIRECTORalias0 , MOVIE AS MOVIEalias0 WHERE CASTalias0.AID = ACTORalias0.AID AND DIRECTORalias0.DID = DIRECTED_BYalias0.DID AND MOVIEalias0.MID = CASTalias0.MSID AND MOVIEalias0.MID = DIRECTED_BYalias0.MSID GROUP BY ACTORalias0.AID ORDER BY COUNT( DISTINCT ( MOVIEalias0.TITLE ) ) DESC LIMIT 1 ;'\n",
65
+ " \n",
66
+ " # table has \"u\" in the neighborhood spelling.\n",
67
+ " n_before, n_after = 'NEIGHBORHOOD', 'NEIGHBOURHOOD'\n",
68
+ " if dataset_name == 'yelp':\n",
69
+ " d['sql'][0] = d['sql'][0].replace(n_before, n_after)\n",
70
+ " \n",
71
+ " if dataset_name == 'yelp' and idx == 42:\n",
72
+ " d['sql'][0] = 'SELECT NEIGHBOURHOODalias0.NEIGHBOURHOOD_NAME FROM BUSINESS AS BUSINESSalias0 , NEIGHBOURHOOD AS NEIGHBOURHOODalias0 , REVIEW AS REVIEWalias0 , USER AS USERalias0 WHERE NEIGHBOURHOODalias0.BUSINESS_ID = BUSINESSalias0.BUSINESS_ID AND REVIEWalias0.BUSINESS_ID = BUSINESSalias0.BUSINESS_ID AND USERalias0.NAME = \"user_name0\" AND USERalias0.USER_ID = REVIEWalias0.USER_ID ;'\n",
73
+ "\n",
74
+ " orig_datasets.extend(orig_dataset)"
75
+ ]
76
+ },
77
+ {
78
+ "cell_type": "code",
79
+ "execution_count": 3,
80
+ "metadata": {},
81
+ "outputs": [
82
+ {
83
+ "name": "stdout",
84
+ "output_type": "stream",
85
+ "text": [
86
+ "There are 3509 datapoints in the new testset\n"
87
+ ]
88
+ }
89
+ ],
90
+ "source": [
91
+ "# we create the new testset here\n",
92
+ "new_testset = []\n",
93
+ "for d in orig_datasets:\n",
94
+ " orig_id = d['orig_id']\n",
95
+ " db_id, idx = orig_id\n",
96
+ " \n",
97
+ " # we only incorporate the test split if the dataset is large enough\n",
98
+ " # otherwise we incorporate the entire dataset\n",
99
+ " if d['query-split'] != 'test' and db_id not in new_split_defined:\n",
100
+ " continue\n",
101
+ " sql = d['sql'][0]\n",
102
+ " instance_variables = d['variables']\n",
103
+ " instance_name2examples = {d['name']: d['example'] for d in instance_variables}\n",
104
+ " \n",
105
+ " # we create a new datapoint for each natural language query\n",
106
+ " for sentence in d['sentences']:\n",
107
+ " new_datapoint = {\n",
108
+ " 'text': sentence['text'],\n",
109
+ " 'query': sql,\n",
110
+ " 'variables': instance_variables,\n",
111
+ " 'orig_id': orig_id,\n",
112
+ " 'db_id': db_id,\n",
113
+ " 'db_path': 'database/{db_id}/{db_id}.sqlite'.format(db_id=db_id)\n",
114
+ " }\n",
115
+ " new_testset.append(new_datapoint)\n",
116
+ "print('There are %d datapoints in the new testset' % len(new_testset))"
117
+ ]
118
+ },
119
+ {
120
+ "cell_type": "code",
121
+ "execution_count": 4,
122
+ "metadata": {},
123
+ "outputs": [],
124
+ "source": [
125
+ "import re\n",
126
+ "\n",
127
+ "# this block implements a function that extract variable names from text and sql\n",
128
+ "# later we use it to ensure that every variable is replaced\n",
129
+ "\n",
130
+ "variable_pattern = re.compile('^[a-z_]+[0-9]+$')\n",
131
+ "\n",
132
+ "def extract_variable_names(t):\n",
133
+ " tokens = t.replace('\"', '').replace('%', '').split(' ')\n",
134
+ " var_names = {v for v in tokens if variable_pattern.match(v) and 'alias' not in v}\n",
135
+ " return var_names\n",
136
+ "\n",
137
+ "test = False\n",
138
+ "if test:\n",
139
+ " sql = 'SELECT BUSINESSalias0.NAME FROM BUSINESS AS BUSINESSalias0 , REVIEW AS REVIEWalias0 WHERE REVIEWalias0.BUSINESS_ID = BUSINESSalias0.BUSINESS_ID AND REVIEWalias0.MONTH = \"review_month0\" GROUP BY BUSINESSalias0.NAME ORDER BY COUNT( DISTINCT ( REVIEWalias0.TEXT ) ) DESC LIMIT 1 ;'\n",
140
+ " print(extract_variable_names(sql))\n",
141
+ " text = 'return me the homepage of journal_name0 .'\n",
142
+ " print(extract_variable_names(text))"
143
+ ]
144
+ },
145
+ {
146
+ "cell_type": "code",
147
+ "execution_count": 5,
148
+ "metadata": {},
149
+ "outputs": [],
150
+ "source": [
151
+ "# this block removes extra space surrounding variable names\n",
152
+ "def remove_extra_space_around_variable(t):\n",
153
+ " var_names = extract_variable_names(t)\n",
154
+ " result = str(t)\n",
155
+ " for v in var_names:\n",
156
+ " result = result.replace('\" ' + v + ' \"', v)\n",
157
+ " return result"
158
+ ]
159
+ },
160
+ {
161
+ "cell_type": "code",
162
+ "execution_count": 6,
163
+ "metadata": {},
164
+ "outputs": [
165
+ {
166
+ "name": "stdout",
167
+ "output_type": "stream",
168
+ "text": [
169
+ "set()\n"
170
+ ]
171
+ }
172
+ ],
173
+ "source": [
174
+ "problematic = set()\n",
175
+ "\n",
176
+ "for datapoint in new_testset:\n",
177
+ " orig_id = datapoint['orig_id']\n",
178
+ " \n",
179
+ " # remove extra whitespace surrounding the text\n",
180
+ " datapoint['text'] = remove_extra_space_around_variable(datapoint['text'])\n",
181
+ " \n",
182
+ " # there should not be extra whitespace surrounding the sql variables\n",
183
+ " if datapoint['query'] != remove_extra_space_around_variable(datapoint['query']):\n",
184
+ " problematic.add(orig_id)\n",
185
+ "\n",
186
+ " text_vars = extract_variable_names(datapoint['text'])\n",
187
+ " sql_vars = extract_variable_names(datapoint['query'])\n",
188
+ " \n",
189
+ " instance_variables = {d['name']: d for d in datapoint['variables']}\n",
190
+ " \n",
191
+ " # we ensure that all the variables in the sql query and the text can be replaced\n",
192
+ " # by some variable in the variable dictionary\n",
193
+ " if len(text_vars - instance_variables.keys()) != 0 or len(sql_vars - instance_variables.keys()):\n",
194
+ " problematic.add(orig_id)\n",
195
+ " \n",
196
+ " # replace the variables with the examples in the variable dictionary\n",
197
+ " for text_var in text_vars:\n",
198
+ " datapoint['text'] = datapoint['text'].replace(text_var, instance_variables[text_var]['example'])\n",
199
+ " \n",
200
+ " for sql_var in sql_vars:\n",
201
+ " datapoint['query'] = datapoint['query'].replace(sql_var, instance_variables[sql_var]['example'])\n",
202
+ "\n",
203
+ "# we can trace back which datapoints do not satisfy the assumption,\n",
204
+ "# then go back and fix it manually\n",
205
+ "print(problematic)"
206
+ ]
207
+ },
208
+ {
209
+ "cell_type": "code",
210
+ "execution_count": 7,
211
+ "metadata": {},
212
+ "outputs": [
213
+ {
214
+ "name": "stdout",
215
+ "output_type": "stream",
216
+ "text": [
217
+ "[{'db_id': 'academic',\n",
218
+ " 'db_path': 'database/academic/academic.sqlite',\n",
219
+ " 'orig_id': ('academic', 0),\n",
220
+ " 'query': 'SELECT JOURNALalias0.HOMEPAGE FROM JOURNAL AS JOURNALalias0 WHERE '\n",
221
+ " 'JOURNALalias0.NAME = \"PVLDB\" ;',\n",
222
+ " 'text': 'return me the homepage of PVLDB .',\n",
223
+ " 'variables': [{'example': 'PVLDB',\n",
224
+ " 'location': 'both',\n",
225
+ " 'name': 'journal_name0',\n",
226
+ " 'type': 'journal_name'}]},\n",
227
+ " {'db_id': 'academic',\n",
228
+ " 'db_path': 'database/academic/academic.sqlite',\n",
229
+ " 'orig_id': ('academic', 1),\n",
230
+ " 'query': 'SELECT AUTHORalias0.HOMEPAGE FROM AUTHOR AS AUTHORalias0 WHERE '\n",
231
+ " 'AUTHORalias0.NAME = \"H. V. Jagadish\" ;',\n",
232
+ " 'text': 'return me the homepage of H. V. Jagadish .',\n",
233
+ " 'variables': [{'example': 'H. V. Jagadish',\n",
234
+ " 'location': 'both',\n",
235
+ " 'name': 'author_name0',\n",
236
+ " 'type': 'author_name'}]},\n",
237
+ " {'db_id': 'academic',\n",
238
+ " 'db_path': 'database/academic/academic.sqlite',\n",
239
+ " 'orig_id': ('academic', 2),\n",
240
+ " 'query': 'SELECT PUBLICATIONalias0.ABSTRACT FROM PUBLICATION AS '\n",
241
+ " 'PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = \"Making database '\n",
242
+ " 'systems usable\" ;',\n",
243
+ " 'text': 'return me the abstract of Making database systems usable .',\n",
244
+ " 'variables': [{'example': 'Making database systems usable',\n",
245
+ " 'location': 'both',\n",
246
+ " 'name': 'publication_title0',\n",
247
+ " 'type': 'publication_title'}]},\n",
248
+ " {'db_id': 'academic',\n",
249
+ " 'db_path': 'database/academic/academic.sqlite',\n",
250
+ " 'orig_id': ('academic', 3),\n",
251
+ " 'query': 'SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS '\n",
252
+ " 'PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = \"Making database '\n",
253
+ " 'systems usable\" ;',\n",
254
+ " 'text': 'return me the year of Making database systems usable',\n",
255
+ " 'variables': [{'example': 'Making database systems usable',\n",
256
+ " 'location': 'both',\n",
257
+ " 'name': 'publication_title0',\n",
258
+ " 'type': 'publication_title'}]},\n",
259
+ " {'db_id': 'academic',\n",
260
+ " 'db_path': 'database/academic/academic.sqlite',\n",
261
+ " 'orig_id': ('academic', 3),\n",
262
+ " 'query': 'SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS '\n",
263
+ " 'PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = \"Making database '\n",
264
+ " 'systems usable\" ;',\n",
265
+ " 'text': 'return me the year of Making database systems usable .',\n",
266
+ " 'variables': [{'example': 'Making database systems usable',\n",
267
+ " 'location': 'both',\n",
268
+ " 'name': 'publication_title0',\n",
269
+ " 'type': 'publication_title'}]}]\n"
270
+ ]
271
+ }
272
+ ],
273
+ "source": [
274
+ "from pprint import pprint\n",
275
+ "\n",
276
+ "pprint(new_testset[:5])"
277
+ ]
278
+ }
279
+ ],
280
+ "metadata": {
281
+ "kernelspec": {
282
+ "display_name": "Python 3",
283
+ "language": "python",
284
+ "name": "python3"
285
+ },
286
+ "language_info": {
287
+ "codemirror_mode": {
288
+ "name": "ipython",
289
+ "version": 3
290
+ },
291
+ "file_extension": ".py",
292
+ "mimetype": "text/x-python",
293
+ "name": "python",
294
+ "nbconvert_exporter": "python",
295
+ "pygments_lexer": "ipython3",
296
+ "version": "3.7.4"
297
+ }
298
+ },
299
+ "nbformat": 4,
300
+ "nbformat_minor": 2
301
+ }
duckdb-nsql/eval/metrics/test_suite_sql_eval/classical_test.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:256f797cea587044881ceffb408185fd2dbd70682c53c788ef02f3cf59dad1ab
3
+ size 3607809
duckdb-nsql/eval/metrics/test_suite_sql_eval/database/readme.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ Please download the database from the goolge drive link mentioned in the repo-level readme and decompress in this directory.
2
+ After this step, "test-suite-sql-eval/database/atis/atis.sqlite" should be a valid file path.
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluate_classical.py ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ from typing import List, Dict, Any, Tuple
3
+ import pickle as pkl
4
+ import tqdm
5
+ from .exec_eval import exec_on_db, result_eq
6
+ import os
7
+ from collections import defaultdict
8
+ import time
9
+ from multiprocessing import cpu_count, Pool, Manager
10
+ from itertools import repeat
11
+
12
+ NUM_PROCESSES = cpu_count() // 3
13
+ if NUM_PROCESSES == 0:
14
+ NUM_PROCESSES = 1
15
+ MULTIPLICATIVE_OVERHEAD = 3
16
+ ADDITIVE_OVERHEAD = 30
17
+ GOLD_TIMEOUT = 100
18
+
19
+ cache_path = "cache.pkl"
20
+ m = Manager()
21
+ cache = m.dict()
22
+
23
+
24
+ def load_predictions(f_path: str) -> List[str]:
25
+ preds = []
26
+ with open(f_path, "r") as in_file:
27
+ for l in in_file:
28
+ preds.append(l.strip())
29
+ return preds
30
+
31
+
32
+ def acc(l, idxes=None):
33
+ if idxes is None:
34
+ idxes = [_ for _ in range(len(l))]
35
+ c = 0
36
+ for idx in idxes:
37
+ if l[idx]:
38
+ c += 1
39
+ return float(c) / len(idxes)
40
+
41
+
42
+ # the input is a tuple of gold_dict, model prediction and whether to use cache
43
+ # and teh output is whether the model prediction passes the entire test suite
44
+ def judge(args: Tuple[Dict[str, Any], str, bool]) -> bool:
45
+ gold_dict, pred, use_cache = args
46
+
47
+ testsuite_paths = gold_dict["testsuite"]
48
+ gold_query = gold_dict["query"]
49
+ order_matters = "order by" in gold_query.lower()
50
+ db_path = gold_dict["db_path"]
51
+
52
+ # if already computed sometime before
53
+ # and cache allowed, directly return the result
54
+ k = (db_path, gold_query, pred)
55
+ if use_cache and k in cache:
56
+ return cache[k]
57
+
58
+ pass_all_testcase = True
59
+ for testcase_path in testsuite_paths:
60
+
61
+ start = time.time()
62
+ flg, gold_result = exec_on_db(testcase_path, gold_query, timeout=GOLD_TIMEOUT)
63
+ duration = time.time() - start
64
+ timeout = ADDITIVE_OVERHEAD + MULTIPLICATIVE_OVERHEAD * duration
65
+
66
+ if flg != "result":
67
+ print("Warning: executing gold query results in an exception")
68
+ continue
69
+ flg, pred_result = exec_on_db(testcase_path, pred, timeout=int(timeout))
70
+ if flg != "result":
71
+ pass_all_testcase = False
72
+ break
73
+ if not result_eq(gold_result, pred_result, order_matters):
74
+ pass_all_testcase = False
75
+ break
76
+
77
+ # save the results in the cache
78
+ if use_cache:
79
+ cache[k] = pass_all_testcase
80
+ return pass_all_testcase
81
+
82
+
83
+ # cache is a dictionary
84
+ # the key is a ternary tuple (empty_database_path, SQL1, SQL2)
85
+ # the value is whether SQL1 and SQL2 are equivalent, judged by the test suites
86
+ def load_cache() -> Dict[Tuple[str, str, str], bool]:
87
+ if os.path.exists(cache_path):
88
+ d = m.dict(pkl.load(open(cache_path, "rb")))
89
+ for k, v in d.items():
90
+ cache[k] = v
91
+ return cache
92
+
93
+
94
+ # dump the cache
95
+ def save_cache():
96
+ pkl.dump(dict(cache), open(cache_path, "wb"))
97
+
98
+
99
+ def main(
100
+ preds: List[str],
101
+ gold_file: str = "classical_test.pkl",
102
+ verbose: bool = True,
103
+ num_processes: int = NUM_PROCESSES,
104
+ subset: str = "full",
105
+ use_cache: bool = True,
106
+ ) -> List[bool]:
107
+ gold_dicts = pkl.load(open(gold_file, "rb"))
108
+ if subset != "full":
109
+ gold_dicts = [
110
+ d
111
+ for d in gold_dicts
112
+ if d["db_path"] == "database/{db_id}/{db_id}.sqlite".format(db_id=subset)
113
+ ]
114
+ assert len(gold_dicts) == len(
115
+ preds
116
+ ), "number of gold and prediction should be equal"
117
+ group_name2idxes = defaultdict(list)
118
+
119
+ for idx, gold_dict in enumerate(gold_dicts):
120
+ group_name2idxes[gold_dict["db_id"]].append(idx)
121
+
122
+ with Pool(num_processes) as pool:
123
+ result = list(
124
+ tqdm.tqdm(
125
+ pool.imap(judge, zip(gold_dicts, preds, repeat(use_cache, len(preds)))),
126
+ total=len(gold_dicts),
127
+ )
128
+ )
129
+
130
+ if verbose:
131
+ print("overall accuracy: ", acc(result))
132
+ for group, idxes in group_name2idxes.items():
133
+ print("accuracy for ", group, acc(result, idxes))
134
+ return result
135
+
136
+
137
+ if __name__ == "__main__":
138
+ start = time.time()
139
+ parser = argparse.ArgumentParser()
140
+ parser.add_argument(
141
+ "--gold",
142
+ dest="gold",
143
+ type=str,
144
+ default="classical_test.pkl",
145
+ help="the path to the predicted queries",
146
+ )
147
+ parser.add_argument(
148
+ "--pred", dest="pred", type=str, help="the path to the predicted queries"
149
+ )
150
+ parser.add_argument(
151
+ "--out_file", type=str, required=True, help="the output file path"
152
+ )
153
+ parser.add_argument(
154
+ "--num_processes", default=NUM_PROCESSES, help="number of processes to use"
155
+ )
156
+ parser.add_argument(
157
+ "--subset",
158
+ default="full",
159
+ choices=(
160
+ "atis",
161
+ "advising",
162
+ "academic",
163
+ "imdb",
164
+ "restaurants",
165
+ "geography",
166
+ "scholar",
167
+ "yelp",
168
+ "full",
169
+ ),
170
+ help="which subset to evaluate on.",
171
+ )
172
+ parser.add_argument(
173
+ "--disable_cache",
174
+ default=False,
175
+ action="store_true",
176
+ help="whether to directly apply previously computed result and cache the current results. "
177
+ "use this flag to disable caching.",
178
+ )
179
+ args = parser.parse_args()
180
+
181
+ preds = load_predictions(args.pred)
182
+ assert not os.path.exists(args.out_file), (
183
+ "output file path %s already exists" % args.out_file
184
+ )
185
+
186
+ use_cache = not args.disable_cache
187
+ if use_cache:
188
+ load_cache()
189
+
190
+ result = main(
191
+ preds=preds,
192
+ gold_file=args.gold,
193
+ verbose=True,
194
+ num_processes=args.num_processes,
195
+ subset=args.subset,
196
+ use_cache=use_cache,
197
+ )
198
+ pkl.dump(result, open(args.out_file, "wb"))
199
+ print("total time used: ", time.time() - start)
200
+
201
+ if use_cache:
202
+ save_cache()
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation.py ADDED
@@ -0,0 +1,1210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ################################
2
+ # val: number(float)/string(str)/sql(dict)
3
+ # col_unit: (agg_id, col_id, isDistinct(bool))
4
+ # val_unit: (unit_op, col_unit1, col_unit2)
5
+ # table_unit: (table_type, col_unit/sql)
6
+ # cond_unit: (not_op, op_id, val_unit, val1, val2)
7
+ # condition: [cond_unit1, 'and'/'or', cond_unit2, ...]
8
+ # sql {
9
+ # 'select': (isDistinct(bool), [(agg_id, val_unit), (agg_id, val_unit), ...])
10
+ # 'from': {'table_units': [table_unit1, table_unit2, ...], 'conds': condition}
11
+ # 'where': condition
12
+ # 'groupBy': [col_unit1, col_unit2, ...]
13
+ # 'orderBy': ('asc'/'desc', [val_unit1, val_unit2, ...])
14
+ # 'having': condition
15
+ # 'limit': None/limit value
16
+ # 'intersect': None/sql
17
+ # 'except': None/sql
18
+ # 'union': None/sql
19
+ # }
20
+ ################################
21
+
22
+ import os
23
+ import json
24
+ import sqlite3
25
+ import argparse
26
+
27
+ from .process_sql import get_schema, Schema, get_sql
28
+ from .exec_eval import eval_exec_match
29
+
30
+ # Flag to disable value evaluation
31
+ LEVELS = ["easy", "medium", "hard", "duckdb", "ddl", "all"]
32
+ TURNS = ["turn 1", "turn 2", "turn 3", "turn 4", "turn > 4"]
33
+ PARTIAL_TYPES = [
34
+ "select",
35
+ "select(no AGG)",
36
+ "where",
37
+ "where(no OP)",
38
+ "group(no Having)",
39
+ "group",
40
+ "order",
41
+ "and/or",
42
+ "IUEN",
43
+ "keywords",
44
+ ]
45
+ DISABLE_VALUE = True
46
+ # Flag to disable distinct in select evaluation
47
+ DISABLE_DISTINCT = True
48
+
49
+
50
+ CLAUSE_KEYWORDS = (
51
+ "select",
52
+ "from",
53
+ "where",
54
+ "group",
55
+ "order",
56
+ "limit",
57
+ "intersect",
58
+ "union",
59
+ "except",
60
+ )
61
+ JOIN_KEYWORDS = ("join", "on", "as")
62
+
63
+ WHERE_OPS = (
64
+ "not",
65
+ "between",
66
+ "=",
67
+ ">",
68
+ "<",
69
+ ">=",
70
+ "<=",
71
+ "!=",
72
+ "in",
73
+ "like",
74
+ "is",
75
+ "exists",
76
+ )
77
+ UNIT_OPS = ("none", "-", "+", "*", "/")
78
+ AGG_OPS = ("none", "max", "min", "count", "sum", "avg")
79
+ TABLE_TYPE = {
80
+ "sql": "sql",
81
+ "table_unit": "table_unit",
82
+ }
83
+
84
+ COND_OPS = ("and", "or")
85
+ SQL_OPS = ("intersect", "union", "except")
86
+ ORDER_OPS = ("desc", "asc")
87
+
88
+
89
+ HARDNESS = {
90
+ "component1": ("where", "group", "order", "limit", "join", "or", "like"),
91
+ "component2": ("except", "union", "intersect"),
92
+ }
93
+
94
+ def condition_has_or(conds):
95
+ return "or" in conds[1::2]
96
+
97
+
98
+ def condition_has_like(conds):
99
+ return WHERE_OPS.index("like") in [cond_unit[1] for cond_unit in conds[::2]]
100
+
101
+
102
+ def condition_has_sql(conds):
103
+ for cond_unit in conds[::2]:
104
+ val1, val2 = cond_unit[3], cond_unit[4]
105
+ if val1 is not None and type(val1) is dict:
106
+ return True
107
+ if val2 is not None and type(val2) is dict:
108
+ return True
109
+ return False
110
+
111
+
112
+ def val_has_op(val_unit):
113
+ return val_unit[0] != UNIT_OPS.index("none")
114
+
115
+
116
+ def has_agg(unit):
117
+ return unit[0] != AGG_OPS.index("none")
118
+
119
+
120
+ def accuracy(count, total):
121
+ if count == total:
122
+ return 1
123
+ return 0
124
+
125
+
126
+ def recall(count, total):
127
+ if count == total:
128
+ return 1
129
+ return 0
130
+
131
+
132
+ def F1(acc, rec):
133
+ if (acc + rec) == 0:
134
+ return 0
135
+ return (2.0 * acc * rec) / (acc + rec)
136
+
137
+
138
+ def get_scores(count, pred_total, label_total):
139
+ if pred_total != label_total:
140
+ return 0, 0, 0
141
+ elif count == pred_total:
142
+ return 1, 1, 1
143
+ return 0, 0, 0
144
+
145
+
146
+ def eval_sel(pred, label):
147
+ pred_sel = pred["select"][1]
148
+ label_sel = label["select"][1]
149
+ label_wo_agg = [unit[1] for unit in label_sel]
150
+ pred_total = len(pred_sel)
151
+ label_total = len(label_sel)
152
+ cnt = 0
153
+ cnt_wo_agg = 0
154
+
155
+ for unit in pred_sel:
156
+ if unit in label_sel:
157
+ cnt += 1
158
+ label_sel.remove(unit)
159
+ if unit[1] in label_wo_agg:
160
+ cnt_wo_agg += 1
161
+ label_wo_agg.remove(unit[1])
162
+
163
+ return label_total, pred_total, cnt, cnt_wo_agg
164
+
165
+
166
+ def eval_where(pred, label):
167
+ pred_conds = [unit for unit in pred["where"][::2]]
168
+ label_conds = [unit for unit in label["where"][::2]]
169
+ label_wo_agg = [unit[2] for unit in label_conds]
170
+ pred_total = len(pred_conds)
171
+ label_total = len(label_conds)
172
+ cnt = 0
173
+ cnt_wo_agg = 0
174
+
175
+ for unit in pred_conds:
176
+ if unit in label_conds:
177
+ cnt += 1
178
+ label_conds.remove(unit)
179
+ if unit[2] in label_wo_agg:
180
+ cnt_wo_agg += 1
181
+ label_wo_agg.remove(unit[2])
182
+
183
+ return label_total, pred_total, cnt, cnt_wo_agg
184
+
185
+
186
+ def eval_group(pred, label):
187
+ pred_cols = [unit[1] for unit in pred["groupBy"]]
188
+ label_cols = [unit[1] for unit in label["groupBy"]]
189
+ pred_total = len(pred_cols)
190
+ label_total = len(label_cols)
191
+ cnt = 0
192
+ pred_cols = [pred.split(".")[1] if "." in pred else pred for pred in pred_cols]
193
+ label_cols = [
194
+ label.split(".")[1] if "." in label else label for label in label_cols
195
+ ]
196
+ for col in pred_cols:
197
+ if col in label_cols:
198
+ cnt += 1
199
+ label_cols.remove(col)
200
+ return label_total, pred_total, cnt
201
+
202
+
203
+ def eval_having(pred, label):
204
+ pred_total = label_total = cnt = 0
205
+ if len(pred["groupBy"]) > 0:
206
+ pred_total = 1
207
+ if len(label["groupBy"]) > 0:
208
+ label_total = 1
209
+
210
+ pred_cols = [unit[1] for unit in pred["groupBy"]]
211
+ label_cols = [unit[1] for unit in label["groupBy"]]
212
+ if (
213
+ pred_total == label_total == 1
214
+ and pred_cols == label_cols
215
+ and pred["having"] == label["having"]
216
+ ):
217
+ cnt = 1
218
+
219
+ return label_total, pred_total, cnt
220
+
221
+
222
+ def eval_order(pred, label):
223
+ pred_total = label_total = cnt = 0
224
+ if len(pred["orderBy"]) > 0:
225
+ pred_total = 1
226
+ if len(label["orderBy"]) > 0:
227
+ label_total = 1
228
+ if (
229
+ len(label["orderBy"]) > 0
230
+ and pred["orderBy"] == label["orderBy"]
231
+ and (
232
+ (pred["limit"] is None and label["limit"] is None)
233
+ or (pred["limit"] is not None and label["limit"] is not None)
234
+ )
235
+ ):
236
+ cnt = 1
237
+ return label_total, pred_total, cnt
238
+
239
+
240
+ def eval_and_or(pred, label):
241
+ pred_ao = pred["where"][1::2]
242
+ label_ao = label["where"][1::2]
243
+ pred_ao = set(pred_ao)
244
+ label_ao = set(label_ao)
245
+
246
+ if pred_ao == label_ao:
247
+ return 1, 1, 1
248
+ return len(pred_ao), len(label_ao), 0
249
+
250
+
251
+ def get_nestedSQL(sql):
252
+ nested = []
253
+ for cond_unit in sql["from"]["conds"][::2] + sql["where"][::2] + sql["having"][::2]:
254
+ if type(cond_unit[3]) is dict:
255
+ nested.append(cond_unit[3])
256
+ if type(cond_unit[4]) is dict:
257
+ nested.append(cond_unit[4])
258
+ if sql["intersect"] is not None:
259
+ nested.append(sql["intersect"])
260
+ if sql["except"] is not None:
261
+ nested.append(sql["except"])
262
+ if sql["union"] is not None:
263
+ nested.append(sql["union"])
264
+ return nested
265
+
266
+
267
+ def eval_nested(pred, label):
268
+ label_total = 0
269
+ pred_total = 0
270
+ cnt = 0
271
+ if pred is not None:
272
+ pred_total += 1
273
+ if label is not None:
274
+ label_total += 1
275
+ if pred is not None and label is not None:
276
+ partial_scores = Evaluator.eval_partial_match(pred, label)
277
+ cnt += Evaluator.eval_exact_match(pred, label, partial_scores)
278
+ return label_total, pred_total, cnt
279
+
280
+
281
+ def eval_IUEN(pred, label):
282
+ lt1, pt1, cnt1 = eval_nested(pred["intersect"], label["intersect"])
283
+ lt2, pt2, cnt2 = eval_nested(pred["except"], label["except"])
284
+ lt3, pt3, cnt3 = eval_nested(pred["union"], label["union"])
285
+ label_total = lt1 + lt2 + lt3
286
+ pred_total = pt1 + pt2 + pt3
287
+ cnt = cnt1 + cnt2 + cnt3
288
+ return label_total, pred_total, cnt
289
+
290
+
291
+ def get_keywords(sql):
292
+ res = set()
293
+ if len(sql["where"]) > 0:
294
+ res.add("where")
295
+ if len(sql["groupBy"]) > 0:
296
+ res.add("group")
297
+ if len(sql["having"]) > 0:
298
+ res.add("having")
299
+ if len(sql["orderBy"]) > 0:
300
+ res.add(sql["orderBy"][0])
301
+ res.add("order")
302
+ if sql["limit"] is not None:
303
+ res.add("limit")
304
+ if sql["except"] is not None:
305
+ res.add("except")
306
+ if sql["union"] is not None:
307
+ res.add("union")
308
+ if sql["intersect"] is not None:
309
+ res.add("intersect")
310
+
311
+ # or keyword
312
+ ao = sql["from"]["conds"][1::2] + sql["where"][1::2] + sql["having"][1::2]
313
+ if len([token for token in ao if token == "or"]) > 0:
314
+ res.add("or")
315
+
316
+ cond_units = sql["from"]["conds"][::2] + sql["where"][::2] + sql["having"][::2]
317
+ # not keyword
318
+ if len([cond_unit for cond_unit in cond_units if cond_unit[0]]) > 0:
319
+ res.add("not")
320
+
321
+ # in keyword
322
+ if (
323
+ len(
324
+ [
325
+ cond_unit
326
+ for cond_unit in cond_units
327
+ if cond_unit[1] == WHERE_OPS.index("in")
328
+ ]
329
+ )
330
+ > 0
331
+ ):
332
+ res.add("in")
333
+
334
+ # like keyword
335
+ if (
336
+ len(
337
+ [
338
+ cond_unit
339
+ for cond_unit in cond_units
340
+ if cond_unit[1] == WHERE_OPS.index("like")
341
+ ]
342
+ )
343
+ > 0
344
+ ):
345
+ res.add("like")
346
+
347
+ return res
348
+
349
+
350
+ def eval_keywords(pred, label):
351
+ pred_keywords = get_keywords(pred)
352
+ label_keywords = get_keywords(label)
353
+ pred_total = len(pred_keywords)
354
+ label_total = len(label_keywords)
355
+ cnt = 0
356
+
357
+ for k in pred_keywords:
358
+ if k in label_keywords:
359
+ cnt += 1
360
+ return label_total, pred_total, cnt
361
+
362
+
363
+ def count_agg(units):
364
+ return len([unit for unit in units if has_agg(unit)])
365
+
366
+
367
+ def count_component1(sql):
368
+ count = 0
369
+ if len(sql["where"]) > 0:
370
+ count += 1
371
+ if len(sql["groupBy"]) > 0:
372
+ count += 1
373
+ if len(sql["orderBy"]) > 0:
374
+ count += 1
375
+ if sql["limit"] is not None:
376
+ count += 1
377
+ if len(sql["from"]["table_units"]) > 0: # JOIN
378
+ count += len(sql["from"]["table_units"]) - 1
379
+
380
+ ao = sql["from"]["conds"][1::2] + sql["where"][1::2] + sql["having"][1::2]
381
+ count += len([token for token in ao if token == "or"])
382
+ cond_units = sql["from"]["conds"][::2] + sql["where"][::2] + sql["having"][::2]
383
+ count += len(
384
+ [
385
+ cond_unit
386
+ for cond_unit in cond_units
387
+ if cond_unit[1] == WHERE_OPS.index("like")
388
+ ]
389
+ )
390
+
391
+ return count
392
+
393
+
394
+ def count_component2(sql):
395
+ nested = get_nestedSQL(sql)
396
+ return len(nested)
397
+
398
+
399
+ def count_others(sql):
400
+ count = 0
401
+ # number of aggregation
402
+ agg_count = count_agg(sql["select"][1])
403
+ agg_count += count_agg(sql["where"][::2])
404
+ agg_count += count_agg(sql["groupBy"])
405
+ if len(sql["orderBy"]) > 0:
406
+ agg_count += count_agg(
407
+ [unit[1] for unit in sql["orderBy"][1] if unit[1]]
408
+ + [unit[2] for unit in sql["orderBy"][1] if unit[2]]
409
+ )
410
+ agg_count += count_agg(sql["having"])
411
+ if agg_count > 1:
412
+ count += 1
413
+
414
+ # number of select columns
415
+ if len(sql["select"][1]) > 1:
416
+ count += 1
417
+
418
+ # number of where conditions
419
+ if len(sql["where"]) > 1:
420
+ count += 1
421
+
422
+ # number of group by clauses
423
+ if len(sql["groupBy"]) > 1:
424
+ count += 1
425
+
426
+ return count
427
+
428
+
429
+ class Evaluator:
430
+ """A simple evaluator"""
431
+
432
+ def __init__(
433
+ self,
434
+ db_dir,
435
+ kmaps,
436
+ etype,
437
+ plug_value,
438
+ keep_distinct,
439
+ progress_bar_for_each_datapoint
440
+ ):
441
+ self.db_dir = db_dir
442
+ self.kmaps = kmaps
443
+ self.etype = etype
444
+ self.plug_value = plug_value
445
+ self.keep_distinct = keep_distinct
446
+ self.progress_bar_for_each_datapoint = progress_bar_for_each_datapoint
447
+
448
+ self.db_paths = {}
449
+ self.schemas = {}
450
+
451
+ self.scores = {}
452
+
453
+ for turn in TURNS:
454
+ self.scores[turn] = {"count": 0, "exact": 0.0}
455
+ self.scores[turn]["exec"] = 0
456
+
457
+ for level in LEVELS:
458
+ self.scores[level] = {"count": 0, "partial": {}, "exact": 0.0}
459
+ self.scores[level]["exec"] = 0
460
+ for type_ in PARTIAL_TYPES:
461
+ self.scores[level]["partial"][type_] = {
462
+ "acc": 0.0,
463
+ "rec": 0.0,
464
+ "f1": 0.0,
465
+ "acc_count": 0,
466
+ "rec_count": 0,
467
+ }
468
+
469
+ def eval_hardness(self, sql):
470
+ count_comp1_ = count_component1(sql)
471
+ count_comp2_ = count_component2(sql)
472
+ count_others_ = count_others(sql)
473
+
474
+ if count_comp1_ <= 1 and count_others_ == 0 and count_comp2_ == 0:
475
+ return "easy"
476
+ elif (count_others_ <= 2 and count_comp1_ <= 1 and count_comp2_ == 0) or (
477
+ count_comp1_ <= 2 and count_others_ < 2 and count_comp2_ == 0
478
+ ):
479
+ return "medium"
480
+ elif (
481
+ (count_others_ > 2 and count_comp1_ <= 2 and count_comp2_ == 0)
482
+ or (2 < count_comp1_ <= 3 and count_others_ <= 2 and count_comp2_ == 0)
483
+ or (count_comp1_ <= 1 and count_others_ == 0 and count_comp2_ <= 1)
484
+ ):
485
+ return "hard"
486
+ else:
487
+ return "extra"
488
+
489
+ @classmethod
490
+ def eval_exact_match(cls, pred, label, partial_scores):
491
+ for key, score in partial_scores.items():
492
+ if score["f1"] != 1:
493
+ return 0
494
+
495
+ if len(label["from"]["table_units"]) > 0:
496
+ label_tables = sorted(label["from"]["table_units"])
497
+ pred_tables = sorted(pred["from"]["table_units"])
498
+ return label_tables == pred_tables
499
+ return 1
500
+
501
+ @classmethod
502
+ def eval_partial_match(cls, pred, label):
503
+ res = {}
504
+
505
+ label_total, pred_total, cnt, cnt_wo_agg = eval_sel(pred, label)
506
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
507
+ res["select"] = {
508
+ "acc": acc,
509
+ "rec": rec,
510
+ "f1": f1,
511
+ "label_total": label_total,
512
+ "pred_total": pred_total,
513
+ }
514
+ acc, rec, f1 = get_scores(cnt_wo_agg, pred_total, label_total)
515
+ res["select(no AGG)"] = {
516
+ "acc": acc,
517
+ "rec": rec,
518
+ "f1": f1,
519
+ "label_total": label_total,
520
+ "pred_total": pred_total,
521
+ }
522
+
523
+ label_total, pred_total, cnt, cnt_wo_agg = eval_where(pred, label)
524
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
525
+ res["where"] = {
526
+ "acc": acc,
527
+ "rec": rec,
528
+ "f1": f1,
529
+ "label_total": label_total,
530
+ "pred_total": pred_total,
531
+ }
532
+ acc, rec, f1 = get_scores(cnt_wo_agg, pred_total, label_total)
533
+ res["where(no OP)"] = {
534
+ "acc": acc,
535
+ "rec": rec,
536
+ "f1": f1,
537
+ "label_total": label_total,
538
+ "pred_total": pred_total,
539
+ }
540
+
541
+ label_total, pred_total, cnt = eval_group(pred, label)
542
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
543
+ res["group(no Having)"] = {
544
+ "acc": acc,
545
+ "rec": rec,
546
+ "f1": f1,
547
+ "label_total": label_total,
548
+ "pred_total": pred_total,
549
+ }
550
+
551
+ label_total, pred_total, cnt = eval_having(pred, label)
552
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
553
+ res["group"] = {
554
+ "acc": acc,
555
+ "rec": rec,
556
+ "f1": f1,
557
+ "label_total": label_total,
558
+ "pred_total": pred_total,
559
+ }
560
+
561
+ label_total, pred_total, cnt = eval_order(pred, label)
562
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
563
+ res["order"] = {
564
+ "acc": acc,
565
+ "rec": rec,
566
+ "f1": f1,
567
+ "label_total": label_total,
568
+ "pred_total": pred_total,
569
+ }
570
+
571
+ label_total, pred_total, cnt = eval_and_or(pred, label)
572
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
573
+ res["and/or"] = {
574
+ "acc": acc,
575
+ "rec": rec,
576
+ "f1": f1,
577
+ "label_total": label_total,
578
+ "pred_total": pred_total,
579
+ }
580
+
581
+ label_total, pred_total, cnt = eval_IUEN(pred, label)
582
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
583
+ res["IUEN"] = {
584
+ "acc": acc,
585
+ "rec": rec,
586
+ "f1": f1,
587
+ "label_total": label_total,
588
+ "pred_total": pred_total,
589
+ }
590
+
591
+ label_total, pred_total, cnt = eval_keywords(pred, label)
592
+ acc, rec, f1 = get_scores(cnt, pred_total, label_total)
593
+ res["keywords"] = {
594
+ "acc": acc,
595
+ "rec": rec,
596
+ "f1": f1,
597
+ "label_total": label_total,
598
+ "pred_total": pred_total,
599
+ }
600
+
601
+ return res
602
+
603
+ def evaluate_one(self, db_name, gold, predicted, setup_sql,
604
+ validate_sql, turn_scores, idx, category):
605
+ if db_name not in self.db_paths:
606
+ db_path = os.path.join(self.db_dir, db_name, db_name + ".duckdb")
607
+ self.db_paths[db_name] = db_path
608
+ self.schemas[db_name] = Schema(get_schema(db_path))
609
+
610
+ if idx > 3:
611
+ idx = "> 4"
612
+ else:
613
+ idx += 1
614
+ turn_id = "turn " + str(idx)
615
+
616
+ hardness = category
617
+
618
+ self.scores[turn_id]["count"] += 1
619
+ self.scores[hardness]["count"] += 1
620
+ self.scores["all"]["count"] += 1
621
+ if self.etype in ['all', 'match']:
622
+ schema = self.schemas[db_name]
623
+ g_sql = get_sql(schema, gold)
624
+ self.scores[hardness]["count"] += 1
625
+
626
+ try:
627
+ p_sql = get_sql(schema, predicted)
628
+ except:
629
+ # If p_sql is not valid, then we will use an empty sql to evaluate with the correct sql
630
+ p_sql = {
631
+ "except": None,
632
+ "from": {"conds": [], "table_units": []},
633
+ "groupBy": [],
634
+ "having": [],
635
+ "intersect": None,
636
+ "limit": None,
637
+ "orderBy": [],
638
+ "select": [False, []],
639
+ "union": None,
640
+ "where": [],
641
+ }
642
+
643
+ if self.etype in ["all", "exec"]:
644
+ exec_score = eval_exec_match(
645
+ db=self.db_paths[db_name],
646
+ p_str=predicted,
647
+ g_str=gold,
648
+ setup_sql=setup_sql,
649
+ validate_sql=validate_sql,
650
+ plug_value=self.plug_value,
651
+ keep_distinct=self.keep_distinct,
652
+ progress_bar_for_each_datapoint=self.progress_bar_for_each_datapoint,
653
+ )
654
+ if exec_score:
655
+ self.scores[hardness]["exec"] += 1
656
+ self.scores[turn_id]["exec"] += 1
657
+ self.scores["all"]["exec"] += 1
658
+ turn_scores["exec"].append(1)
659
+ else:
660
+ turn_scores["exec"].append(0)
661
+
662
+ if self.etype in ["all", "match"]:
663
+ # rebuild sql for value evaluation
664
+ kmap = self.kmaps[db_name]
665
+ g_valid_col_units = build_valid_col_units(
666
+ g_sql["from"]["table_units"], schema
667
+ )
668
+ g_sql = rebuild_sql_val(g_sql)
669
+ g_sql = rebuild_sql_col(g_valid_col_units, g_sql, kmap)
670
+ p_valid_col_units = build_valid_col_units(
671
+ p_sql["from"]["table_units"], schema
672
+ )
673
+ p_sql = rebuild_sql_val(p_sql)
674
+ p_sql = rebuild_sql_col(p_valid_col_units, p_sql, kmap)
675
+ partial_scores = self.eval_partial_match(p_sql, g_sql)
676
+ exact_score = self.eval_exact_match(p_sql, g_sql, partial_scores)
677
+ if exact_score == 0:
678
+ turn_scores["exact"].append(0)
679
+ print("{} pred: {}".format(hardness, predicted))
680
+ print("{} gold: {}".format(hardness, gold))
681
+ print("")
682
+ else:
683
+ turn_scores["exact"].append(1)
684
+ self.scores[turn_id]["exact"] += exact_score
685
+ self.scores[hardness]["exact"] += exact_score
686
+ self.scores["all"]["exact"] += exact_score
687
+ for type_ in PARTIAL_TYPES:
688
+ if partial_scores[type_]["pred_total"] > 0:
689
+ self.scores[hardness]["partial"][type_]["acc"] += partial_scores[
690
+ type_
691
+ ]["acc"]
692
+ self.scores[hardness]["partial"][type_]["acc_count"] += 1
693
+ if partial_scores[type_]["label_total"] > 0:
694
+ self.scores[hardness]["partial"][type_]["rec"] += partial_scores[
695
+ type_
696
+ ]["rec"]
697
+ self.scores[hardness]["partial"][type_]["rec_count"] += 1
698
+ self.scores[hardness]["partial"][type_]["f1"] += partial_scores[type_][
699
+ "f1"
700
+ ]
701
+ if partial_scores[type_]["pred_total"] > 0:
702
+ self.scores["all"]["partial"][type_]["acc"] += partial_scores[type_][
703
+ "acc"
704
+ ]
705
+ self.scores["all"]["partial"][type_]["acc_count"] += 1
706
+ if partial_scores[type_]["label_total"] > 0:
707
+ self.scores["all"]["partial"][type_]["rec"] += partial_scores[type_][
708
+ "rec"
709
+ ]
710
+ self.scores["all"]["partial"][type_]["rec_count"] += 1
711
+ self.scores["all"]["partial"][type_]["f1"] += partial_scores[type_]["f1"]
712
+
713
+ result = {
714
+ "predictSQL": predicted,
715
+ "goldSQL": gold,
716
+ }
717
+ if self.etype in ['all', 'match']:
718
+ result.update({
719
+ "hardness": hardness,
720
+ "exact": exact_score,
721
+ "partial": partial_scores,
722
+ })
723
+ if self.etype in ['all', 'exec']:
724
+ result['exec'] = exec_score
725
+ return result
726
+
727
+ def finalize(self):
728
+ scores = self.scores
729
+ for turn in TURNS:
730
+ if scores[turn]["count"] == 0:
731
+ continue
732
+ if self.etype in ["all", "exec"]:
733
+ scores[turn]["exec"] /= scores[turn]["count"]
734
+
735
+ if self.etype in ["all", "match"]:
736
+ scores[turn]["exact"] /= scores[turn]["count"]
737
+
738
+ for level in LEVELS:
739
+ if scores[level]["count"] == 0:
740
+ continue
741
+ if self.etype in ["all", "exec"]:
742
+ scores[level]["exec"] /= scores[level]["count"]
743
+
744
+ if self.etype in ["all", "match"]:
745
+ scores[level]["exact"] /= scores[level]["count"]
746
+ for type_ in PARTIAL_TYPES:
747
+ if scores[level]["partial"][type_]["acc_count"] == 0:
748
+ scores[level]["partial"][type_]["acc"] = 0
749
+ else:
750
+ scores[level]["partial"][type_]["acc"] = (
751
+ scores[level]["partial"][type_]["acc"]
752
+ / scores[level]["partial"][type_]["acc_count"]
753
+ * 1.0
754
+ )
755
+ if scores[level]["partial"][type_]["rec_count"] == 0:
756
+ scores[level]["partial"][type_]["rec"] = 0
757
+ else:
758
+ scores[level]["partial"][type_]["rec"] = (
759
+ scores[level]["partial"][type_]["rec"]
760
+ / scores[level]["partial"][type_]["rec_count"]
761
+ * 1.0
762
+ )
763
+ if (
764
+ scores[level]["partial"][type_]["acc"] == 0
765
+ and scores[level]["partial"][type_]["rec"] == 0
766
+ ):
767
+ scores[level]["partial"][type_]["f1"] = 1
768
+ else:
769
+ scores[level]["partial"][type_]["f1"] = (
770
+ 2.0
771
+ * scores[level]["partial"][type_]["acc"]
772
+ * scores[level]["partial"][type_]["rec"]
773
+ / (
774
+ scores[level]["partial"][type_]["rec"]
775
+ + scores[level]["partial"][type_]["acc"]
776
+ )
777
+ )
778
+
779
+
780
+ def isValidSQL(sql, db):
781
+ conn = sqlite3.connect(db)
782
+ cursor = conn.cursor()
783
+ try:
784
+ cursor.execute(sql)
785
+ except:
786
+ return False
787
+ return True
788
+
789
+
790
+ def print_formated_s(row_name, l, element_format):
791
+ template = "{:20} " + " ".join([element_format] * len(l))
792
+ print(template.format(row_name, *l))
793
+
794
+
795
+ def print_scores(scores, etype, include_turn_acc=True):
796
+ turns = TURNS
797
+ levels = ["easy", "medium", "hard", "duckdb", "ddl", "all"]
798
+ if include_turn_acc:
799
+ levels.append("joint_all")
800
+ partial_types = PARTIAL_TYPES
801
+
802
+ print_formated_s("", levels, "{:20}")
803
+ counts = [scores[level]["count"] for level in levels]
804
+ print_formated_s("count", counts, "{:<20d}")
805
+
806
+ if etype in ["all", "exec"]:
807
+ print("===================== EXECUTION ACCURACY =====================")
808
+ exec_scores = [scores[level]["exec"] for level in levels]
809
+ print_formated_s("execution", exec_scores, "{:<20.3f}")
810
+
811
+ if etype in ["all", "match"]:
812
+ print("\n====================== EXACT MATCHING ACCURACY =====================")
813
+ exact_scores = [scores[level]["exact"] for level in levels]
814
+ print_formated_s("exact match", exact_scores, "{:<20.3f}")
815
+ print("\n---------------------PARTIAL MATCHING ACCURACY----------------------")
816
+ for type_ in partial_types:
817
+ this_scores = [scores[level]["partial"][type_]["acc"] for level in levels]
818
+ print_formated_s(type_, this_scores, "{:<20.3f}")
819
+
820
+ print("---------------------- PARTIAL MATCHING RECALL ----------------------")
821
+ for type_ in partial_types:
822
+ this_scores = [scores[level]["partial"][type_]["rec"] for level in levels]
823
+ print_formated_s(type_, this_scores, "{:<20.3f}")
824
+
825
+ print("---------------------- PARTIAL MATCHING F1 --------------------------")
826
+ for type_ in partial_types:
827
+ this_scores = [scores[level]["partial"][type_]["f1"] for level in levels]
828
+ print_formated_s(type_, this_scores, "{:<20.3f}")
829
+
830
+ if include_turn_acc:
831
+ print()
832
+ print()
833
+ print_formated_s("", turns, "{:20}")
834
+ counts = [scores[turn]["count"] for turn in turns]
835
+ print_formated_s("count", counts, "{:<20d}")
836
+
837
+ if etype in ["all", "exec"]:
838
+ print(
839
+ "===================== TURN EXECUTION ACCURACY ====================="
840
+ )
841
+ exec_scores = [scores[turn]["exec"] for turn in turns]
842
+ print_formated_s("execution", exec_scores, "{:<20.3f}")
843
+
844
+ if etype in ["all", "match"]:
845
+ print(
846
+ "\n====================== TURN EXACT MATCHING ACCURACY ====================="
847
+ )
848
+ exact_scores = [scores[turn]["exact"] for turn in turns]
849
+ print_formated_s("exact match", exact_scores, "{:<20.3f}")
850
+
851
+
852
+ def evaluate(
853
+ gold,
854
+ predict,
855
+ db_dir,
856
+ etype,
857
+ kmaps,
858
+ plug_value,
859
+ keep_distinct,
860
+ progress_bar_for_each_datapoint,
861
+ ):
862
+ with open(gold) as f:
863
+ glist = []
864
+ gseq_one = []
865
+ for l in f.readlines():
866
+ if len(l.strip()) == 0:
867
+ glist.append(gseq_one)
868
+ gseq_one = []
869
+ else:
870
+ lstrip = l.strip().split("\t")
871
+ gseq_one.append(lstrip)
872
+
873
+ # include the last session
874
+ # this was previously ignored in the SParC evaluation script
875
+ # which might lead to slight differences in scores
876
+ if len(gseq_one) != 0:
877
+ glist.append(gseq_one)
878
+
879
+ # spider formatting indicates that there is only one "single turn"
880
+ # do not report "turn accuracy" for SPIDER
881
+ include_turn_acc = len(glist) > 1
882
+
883
+ with open(predict) as f:
884
+ plist = []
885
+ pseq_one = []
886
+ for l in f.readlines():
887
+ if len(l.strip()) == 0:
888
+ plist.append(pseq_one)
889
+ pseq_one = []
890
+ else:
891
+ pseq_one.append(l.strip().split("\t"))
892
+
893
+ if len(pseq_one) != 0:
894
+ plist.append(pseq_one)
895
+
896
+ assert len(plist) == len(glist), "number of sessions must equal"
897
+
898
+ evaluator = Evaluator(db_dir, kmaps, etype, plug_value, keep_distinct, progress_bar_for_each_datapoint)
899
+ results = []
900
+
901
+ for i, (p, g) in enumerate(zip(plist, glist)):
902
+ if (i + 1) % 10 == 0:
903
+ print("Evaluating %dth prediction" % (i + 1))
904
+ evaluator.scores["joint_all"]["count"] += 1
905
+ turn_scores = {"exec": [], "exact": []}
906
+ for idx, pg in enumerate(zip(p, g)):
907
+ p, g = pg
908
+ p_str = p[0]
909
+ p_str = p_str.replace("value", "1")
910
+ g_str, db_name = g
911
+
912
+ results.append(evaluator.evaluate_one(db_name, g_str, p_str, "", "", turn_scores, idx, ""))
913
+
914
+ if all(v == 1 for v in turn_scores["exec"]):
915
+ evaluator.scores["joint_all"]["exec"] += 1
916
+
917
+ if all(v == 1 for v in turn_scores["exact"]):
918
+ evaluator.scores["joint_all"]["exact"] += 1
919
+
920
+ evaluator.finalize()
921
+ print_scores(evaluator.scores, etype, include_turn_acc=include_turn_acc)
922
+ return {
923
+ "per_item": results,
924
+ "total_scores": evaluator.scores
925
+ }
926
+
927
+
928
+ # Rebuild SQL functions for value evaluation
929
+ def rebuild_cond_unit_val(cond_unit):
930
+ if cond_unit is None or not DISABLE_VALUE:
931
+ return cond_unit
932
+
933
+ not_op, op_id, val_unit, val1, val2 = cond_unit
934
+ if type(val1) is not dict:
935
+ val1 = None
936
+ else:
937
+ val1 = rebuild_sql_val(val1)
938
+ if type(val2) is not dict:
939
+ val2 = None
940
+ else:
941
+ val2 = rebuild_sql_val(val2)
942
+ return not_op, op_id, val_unit, val1, val2
943
+
944
+
945
+ def rebuild_condition_val(condition):
946
+ if condition is None or not DISABLE_VALUE:
947
+ return condition
948
+
949
+ res = []
950
+ for idx, it in enumerate(condition):
951
+ if idx % 2 == 0:
952
+ res.append(rebuild_cond_unit_val(it))
953
+ else:
954
+ res.append(it)
955
+ return res
956
+
957
+
958
+ def rebuild_sql_val(sql):
959
+ if sql is None or not DISABLE_VALUE:
960
+ return sql
961
+
962
+ sql["from"]["conds"] = rebuild_condition_val(sql["from"]["conds"])
963
+ sql["having"] = rebuild_condition_val(sql["having"])
964
+ sql["where"] = rebuild_condition_val(sql["where"])
965
+ sql["intersect"] = rebuild_sql_val(sql["intersect"])
966
+ sql["except"] = rebuild_sql_val(sql["except"])
967
+ sql["union"] = rebuild_sql_val(sql["union"])
968
+
969
+ return sql
970
+
971
+
972
+ # Rebuild SQL functions for foreign key evaluation
973
+ def build_valid_col_units(table_units, schema):
974
+ col_ids = [
975
+ table_unit[1]
976
+ for table_unit in table_units
977
+ if table_unit[0] == TABLE_TYPE["table_unit"]
978
+ ]
979
+ prefixs = [col_id[:-2] for col_id in col_ids]
980
+ valid_col_units = []
981
+ for value in schema.idMap.values():
982
+ if "." in value and value[: value.index(".")] in prefixs:
983
+ valid_col_units.append(value)
984
+ return valid_col_units
985
+
986
+
987
+ def rebuild_col_unit_col(valid_col_units, col_unit, kmap):
988
+ if col_unit is None:
989
+ return col_unit
990
+
991
+ agg_id, col_id, distinct = col_unit
992
+ if col_id in kmap and col_id in valid_col_units:
993
+ col_id = kmap[col_id]
994
+ if DISABLE_DISTINCT:
995
+ distinct = None
996
+ return agg_id, col_id, distinct
997
+
998
+
999
+ def rebuild_val_unit_col(valid_col_units, val_unit, kmap):
1000
+ if val_unit is None:
1001
+ return val_unit
1002
+
1003
+ unit_op, col_unit1, col_unit2 = val_unit
1004
+ col_unit1 = rebuild_col_unit_col(valid_col_units, col_unit1, kmap)
1005
+ col_unit2 = rebuild_col_unit_col(valid_col_units, col_unit2, kmap)
1006
+ return unit_op, col_unit1, col_unit2
1007
+
1008
+
1009
+ def rebuild_table_unit_col(valid_col_units, table_unit, kmap):
1010
+ if table_unit is None:
1011
+ return table_unit
1012
+
1013
+ table_type, col_unit_or_sql = table_unit
1014
+ if isinstance(col_unit_or_sql, tuple):
1015
+ col_unit_or_sql = rebuild_col_unit_col(valid_col_units, col_unit_or_sql, kmap)
1016
+ return table_type, col_unit_or_sql
1017
+
1018
+
1019
+ def rebuild_cond_unit_col(valid_col_units, cond_unit, kmap):
1020
+ if cond_unit is None:
1021
+ return cond_unit
1022
+
1023
+ not_op, op_id, val_unit, val1, val2 = cond_unit
1024
+ val_unit = rebuild_val_unit_col(valid_col_units, val_unit, kmap)
1025
+ return not_op, op_id, val_unit, val1, val2
1026
+
1027
+
1028
+ def rebuild_condition_col(valid_col_units, condition, kmap):
1029
+ for idx in range(len(condition)):
1030
+ if idx % 2 == 0:
1031
+ condition[idx] = rebuild_cond_unit_col(
1032
+ valid_col_units, condition[idx], kmap
1033
+ )
1034
+ return condition
1035
+
1036
+
1037
+ def rebuild_select_col(valid_col_units, sel, kmap):
1038
+ if sel is None:
1039
+ return sel
1040
+ distinct, _list = sel
1041
+ new_list = []
1042
+ for it in _list:
1043
+ agg_id, val_unit = it
1044
+ new_list.append((agg_id, rebuild_val_unit_col(valid_col_units, val_unit, kmap)))
1045
+ if DISABLE_DISTINCT:
1046
+ distinct = None
1047
+ return distinct, new_list
1048
+
1049
+
1050
+ def rebuild_from_col(valid_col_units, from_, kmap):
1051
+ if from_ is None:
1052
+ return from_
1053
+
1054
+ from_["table_units"] = [
1055
+ rebuild_table_unit_col(valid_col_units, table_unit, kmap)
1056
+ for table_unit in from_["table_units"]
1057
+ ]
1058
+ from_["conds"] = rebuild_condition_col(valid_col_units, from_["conds"], kmap)
1059
+ return from_
1060
+
1061
+
1062
+ def rebuild_group_by_col(valid_col_units, group_by, kmap):
1063
+ if group_by is None:
1064
+ return group_by
1065
+
1066
+ return [
1067
+ rebuild_col_unit_col(valid_col_units, col_unit, kmap) for col_unit in group_by
1068
+ ]
1069
+
1070
+
1071
+ def rebuild_order_by_col(valid_col_units, order_by, kmap):
1072
+ if order_by is None or len(order_by) == 0:
1073
+ return order_by
1074
+
1075
+ direction, val_units = order_by
1076
+ new_val_units = [
1077
+ rebuild_val_unit_col(valid_col_units, val_unit, kmap) for val_unit in val_units
1078
+ ]
1079
+ return direction, new_val_units
1080
+
1081
+
1082
+ def rebuild_sql_col(valid_col_units, sql, kmap):
1083
+ if sql is None:
1084
+ return sql
1085
+
1086
+ sql["select"] = rebuild_select_col(valid_col_units, sql["select"], kmap)
1087
+ sql["from"] = rebuild_from_col(valid_col_units, sql["from"], kmap)
1088
+ sql["where"] = rebuild_condition_col(valid_col_units, sql["where"], kmap)
1089
+ sql["groupBy"] = rebuild_group_by_col(valid_col_units, sql["groupBy"], kmap)
1090
+ sql["orderBy"] = rebuild_order_by_col(valid_col_units, sql["orderBy"], kmap)
1091
+ sql["having"] = rebuild_condition_col(valid_col_units, sql["having"], kmap)
1092
+ sql["intersect"] = rebuild_sql_col(valid_col_units, sql["intersect"], kmap)
1093
+ sql["except"] = rebuild_sql_col(valid_col_units, sql["except"], kmap)
1094
+ sql["union"] = rebuild_sql_col(valid_col_units, sql["union"], kmap)
1095
+
1096
+ return sql
1097
+
1098
+
1099
+ def build_foreign_key_map(entry):
1100
+ cols_orig = entry["column_names_original"]
1101
+ tables_orig = entry["table_names_original"]
1102
+
1103
+ # rebuild cols corresponding to idmap in Schema
1104
+ cols = []
1105
+ for col_orig in cols_orig:
1106
+ if col_orig[0] >= 0:
1107
+ t = tables_orig[col_orig[0]]
1108
+ c = col_orig[1]
1109
+ cols.append("__" + t.lower() + "." + c.lower() + "__")
1110
+ else:
1111
+ cols.append("__all__")
1112
+
1113
+ def keyset_in_list(k1, k2, k_list):
1114
+ for k_set in k_list:
1115
+ if k1 in k_set or k2 in k_set:
1116
+ return k_set
1117
+ new_k_set = set()
1118
+ k_list.append(new_k_set)
1119
+ return new_k_set
1120
+
1121
+ foreign_key_list = []
1122
+ foreign_keys = entry["foreign_keys"]
1123
+ for fkey in foreign_keys:
1124
+ key1, key2 = fkey
1125
+ key_set = keyset_in_list(key1, key2, foreign_key_list)
1126
+ key_set.add(key1)
1127
+ key_set.add(key2)
1128
+
1129
+ foreign_key_map = {}
1130
+ for key_set in foreign_key_list:
1131
+ sorted_list = sorted(list(key_set))
1132
+ midx = sorted_list[0]
1133
+ for idx in sorted_list:
1134
+ foreign_key_map[cols[idx]] = cols[midx]
1135
+
1136
+ return foreign_key_map
1137
+
1138
+
1139
+ def build_foreign_key_map_from_json(table):
1140
+ with open(table) as f:
1141
+ data = json.load(f)
1142
+ tables = {}
1143
+ for entry in data:
1144
+ tables[entry["db_id"]] = build_foreign_key_map(entry)
1145
+ return tables
1146
+
1147
+
1148
+ if __name__ == "__main__":
1149
+ parser = argparse.ArgumentParser()
1150
+ parser.add_argument(
1151
+ "--gold", dest="gold", type=str, help="the path to the gold queries"
1152
+ )
1153
+ parser.add_argument(
1154
+ "--pred", dest="pred", type=str, help="the path to the predicted queries"
1155
+ )
1156
+ parser.add_argument(
1157
+ "--db",
1158
+ dest="db",
1159
+ type=str,
1160
+ help="the directory that contains all the databases and test suites",
1161
+ )
1162
+ parser.add_argument(
1163
+ "--table", dest="table", type=str, help="the tables.json schema file"
1164
+ )
1165
+ parser.add_argument(
1166
+ "--etype",
1167
+ dest="etype",
1168
+ type=str,
1169
+ default="exec",
1170
+ help="evaluation type, exec for test suite accuracy, match for the original exact set match accuracy",
1171
+ choices=("all", "exec", "match"),
1172
+ )
1173
+ parser.add_argument(
1174
+ "--plug_value",
1175
+ default=False,
1176
+ action="store_true",
1177
+ help="whether to plug in the gold value into the predicted query; suitable if your model does not predict values.",
1178
+ )
1179
+ parser.add_argument(
1180
+ "--keep_distinct",
1181
+ default=False,
1182
+ action="store_true",
1183
+ help="whether to keep distinct keyword during evaluation. default is false.",
1184
+ )
1185
+ parser.add_argument(
1186
+ "--progress_bar_for_each_datapoint",
1187
+ default=False,
1188
+ action="store_true",
1189
+ help="whether to print progress bar of running test inputs for each datapoint",
1190
+ )
1191
+ args = parser.parse_args()
1192
+
1193
+ # only evaluting exact match needs this argument
1194
+ kmaps = None
1195
+ if args.etype in ["all", "match"]:
1196
+ assert (
1197
+ args.table is not None
1198
+ ), "table argument must be non-None if exact set match is evaluated"
1199
+ kmaps = build_foreign_key_map_from_json(args.table)
1200
+
1201
+ evaluate(
1202
+ args.gold,
1203
+ args.pred,
1204
+ args.db,
1205
+ args.etype,
1206
+ kmaps,
1207
+ args.plug_value,
1208
+ args.keep_distinct,
1209
+ args.progress_bar_for_each_datapoint,
1210
+ )
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/academic_gold.txt ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SELECT JOURNALalias0.HOMEPAGE FROM JOURNAL AS JOURNALalias0 WHERE JOURNALalias0.NAME = "PVLDB" ;
2
+ SELECT AUTHORalias0.HOMEPAGE FROM AUTHOR AS AUTHORalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" ;
3
+ SELECT PUBLICATIONalias0.ABSTRACT FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
4
+ SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
5
+ SELECT PUBLICATIONalias0.YEAR FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
6
+ SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.YEAR > 2000 ;
7
+ SELECT CONFERENCEalias0.HOMEPAGE FROM CONFERENCE AS CONFERENCEalias0 WHERE CONFERENCEalias0.NAME = "VLDB" ;
8
+ SELECT KEYWORDalias0.KEYWORD FROM KEYWORD AS KEYWORDalias0 ;
9
+ SELECT ORGANIZATIONalias0.NAME FROM ORGANIZATION AS ORGANIZATIONalias0 ;
10
+ SELECT ORGANIZATIONalias0.NAME FROM ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.CONTINENT = "North America" ;
11
+ SELECT ORGANIZATIONalias0.HOMEPAGE FROM ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" ;
12
+ SELECT PUBLICATIONalias0.REFERENCE_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
13
+ SELECT PUBLICATIONalias0.REFERENCE_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
14
+ SELECT PUBLICATIONalias0.CITATION_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
15
+ SELECT PUBLICATIONalias0.CITATION_NUM FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" ;
16
+ SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.CITATION_NUM > 200 ;
17
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR = 2010 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
18
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2010 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
19
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR = 2002 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
20
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2002 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
21
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2002 AND PUBLICATIONalias0.YEAR > 1995 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
22
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ( PUBLICATIONalias0.YEAR < 1995 OR PUBLICATIONalias0.YEAR > 2002 ) AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
23
+ SELECT DOMAINalias0.NAME FROM DOMAIN AS DOMAINalias0 , DOMAIN_JOURNAL AS DOMAIN_JOURNALalias0 , JOURNAL AS JOURNALalias0 WHERE DOMAINalias0.DID = DOMAIN_JOURNALalias0.DID AND JOURNALalias0.JID = DOMAIN_JOURNALalias0.JID AND JOURNALalias0.NAME = "PVLDB" ;
24
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
25
+ SELECT ORGANIZATIONalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
26
+ SELECT CONFERENCEalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
27
+ SELECT JOURNALalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
28
+ SELECT DOMAINalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID ;
29
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
30
+ SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.TITLE = "Making database systems usable" ;
31
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
32
+ SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
33
+ SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
34
+ SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ;
35
+ SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 ;
36
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
37
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
38
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
39
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
40
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
41
+ SELECT DOMAINalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_CONFERENCE AS DOMAIN_CONFERENCEalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND DOMAIN_CONFERENCEalias0.CID = CONFERENCEalias0.CID AND DOMAINalias0.DID = DOMAIN_CONFERENCEalias0.DID ;
42
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
43
+ SELECT KEYWORDalias0.KEYWORD FROM DOMAIN AS DOMAINalias0 , DOMAIN_KEYWORD AS DOMAIN_KEYWORDalias0 , KEYWORD AS KEYWORDalias0 WHERE DOMAINalias0.DID = DOMAIN_KEYWORDalias0.DID AND DOMAINalias0.NAME = "Databases" AND KEYWORDalias0.KID = DOMAIN_KEYWORDalias0.KID ;
44
+ SELECT PUBLICATIONalias0.TITLE FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Natural Language" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
45
+ SELECT KEYWORDalias0.KEYWORD FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND PUBLICATIONalias0.TITLE = "Making database systems usable" ;
46
+ SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
47
+ SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
48
+ SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
49
+ SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
50
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND KEYWORDalias0.KEYWORD = "User Study" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
51
+ SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND KEYWORDalias0.KEYWORD = "Keyword search" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
52
+ SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND KEYWORDalias0.KEYWORD = "Information Retrieval" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
53
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
54
+ SELECT ORGANIZATIONalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
55
+ SELECT ORGANIZATIONalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.CONTINENT = "North America" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
56
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
57
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
58
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
59
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
60
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
61
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
62
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
63
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
64
+ SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID ;
65
+ SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
66
+ SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.CITATION_NUM > 200 ;
67
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CITATION_NUM > 200 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
68
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
69
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.CITATION_NUM > 200 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
70
+ SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.YEAR > 2000 ;
71
+ SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID AND PUBLICATIONalias0.YEAR > 2000 ;
72
+ SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ;
73
+ SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.CITATION_NUM > 200 AND PUBLICATIONalias0.YEAR > 2000 ;
74
+ SELECT COUNT( DISTINCT ( CONFERENCEalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
75
+ SELECT COUNT( DISTINCT ( JOURNALalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
76
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) , PUBLICATIONalias0.YEAR FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY PUBLICATIONalias0.YEAR ;
77
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
78
+ SELECT PUBLICATIONalias0.YEAR , SUM( PUBLICATIONalias0.CITATION_NUM ) FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.TITLE = "Making database systems usable" GROUP BY PUBLICATIONalias0.YEAR ;
79
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias1.TITLE ) ) FROM CITE AS CITEalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias1 WHERE PUBLICATIONalias0.PID = CITEalias0.CITED AND PUBLICATIONalias0.TITLE = "Making database systems usable" AND PUBLICATIONalias1.PID = CITEalias0.CITING AND PUBLICATIONalias1.YEAR < 2010 ;
80
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
81
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
82
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
83
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.YEAR > 2000 ;
84
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ;
85
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 ;
86
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
87
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
88
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
89
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
90
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
91
+ SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM KEYWORD AS KEYWORDalias0 ;
92
+ SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM DOMAIN AS DOMAINalias0 , DOMAIN_KEYWORD AS DOMAIN_KEYWORDalias0 , KEYWORD AS KEYWORDalias0 WHERE DOMAINalias0.DID = DOMAIN_KEYWORDalias0.DID AND DOMAINalias0.NAME = "Databases" AND KEYWORDalias0.KID = DOMAIN_KEYWORDalias0.KID ;
93
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Natural Language" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
94
+ SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND PUBLICATIONalias0.TITLE = "Making database systems usable" ;
95
+ SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
96
+ SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
97
+ SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
98
+ SELECT COUNT( DISTINCT ( KEYWORDalias0.KEYWORD ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
99
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND KEYWORDalias0.KEYWORD = "User Study" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
100
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND KEYWORDalias0.KEYWORD = "Keyword search" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
101
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND KEYWORDalias0.KEYWORD = "Information Retrieval" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
102
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
103
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Natural Language" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID ;
104
+ SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM ORGANIZATION AS ORGANIZATIONalias0 ;
105
+ SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.CONTINENT = "North America" ;
106
+ SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
107
+ SELECT COUNT( DISTINCT ( ORGANIZATIONalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.CONTINENT = "North America" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
108
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
109
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID ;
110
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
111
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
112
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
113
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
114
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
115
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
116
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
117
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID ;
118
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
119
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ;
120
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR < 2000 ;
121
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2000 ;
122
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
123
+ SELECT PUBLICATIONalias0.CITATION_NUM FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ;
124
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR = 2005 ;
125
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR < 2005 ;
126
+ SELECT PUBLICATIONalias0.YEAR , SUM( PUBLICATIONalias0.CITATION_NUM ) FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID GROUP BY PUBLICATIONalias0.YEAR ;
127
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) , PUBLICATIONalias0.YEAR FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID GROUP BY PUBLICATIONalias0.YEAR ;
128
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
129
+ SELECT PUBLICATIONalias0.CITATION_NUM FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ;
130
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR = 2005 ;
131
+ SELECT SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR < 2005 ;
132
+ SELECT PUBLICATIONalias0.YEAR , SUM( PUBLICATIONalias0.CITATION_NUM ) FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID GROUP BY PUBLICATIONalias0.YEAR ;
133
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) , PUBLICATIONalias0.YEAR FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID GROUP BY PUBLICATIONalias0.YEAR ;
134
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , AUTHOR AS AUTHORalias2 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 , WRITES AS WRITESalias2 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND AUTHORalias2.NAME = "Divesh Srivastava" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID AND WRITESalias2.AID = AUTHORalias2.AID AND WRITESalias2.PID = PUBLICATIONalias0.PID ;
135
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , ORGANIZATION AS ORGANIZATIONalias0 , ORGANIZATION AS ORGANIZATIONalias1 WHERE ( AUTHORalias1.NAME = "H. V. Jagadish" OR AUTHORalias1.NAME = "Divesh Srivastava" ) AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND ORGANIZATIONalias0.OID = AUTHORalias1.OID ;
136
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.YEAR > 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
137
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
138
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Yunyao Li" AND PUBLICATIONalias0.YEAR > 2005 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
139
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Yunyao Li" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
140
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Yunyao Li" AND JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2005 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
141
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
142
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND PUBLICATIONalias0.YEAR < 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
143
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , CITE AS CITEalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.PID = CITEalias0.CITING AND PUBLICATIONalias1.PID = CITEalias0.CITED AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID ;
144
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
145
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND PUBLICATIONalias0.YEAR < 2000 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
146
+ SELECT COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , AUTHOR AS AUTHORalias2 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 , WRITES AS WRITESalias2 WHERE AUTHORalias0.NAME = "Cong Yu" AND AUTHORalias1.NAME = "H. V. Jagadish" AND AUTHORalias2.NAME = "Yunyao Li" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID AND WRITESalias2.AID = AUTHORalias2.AID AND WRITESalias2.PID = PUBLICATIONalias0.PID ;
147
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
148
+ SELECT COUNT( DISTINCT ( AUTHORalias0.NAME ) ) FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , CITE AS CITEalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias1.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.PID = CITEalias0.CITING AND PUBLICATIONalias1.PID = CITEalias0.CITED AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID ;
149
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND AUTHORalias1.NAME = "Divesh Srivastava" AND PUBLICATIONalias0.CITATION_NUM > 200 AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ;
150
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
151
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
152
+ SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
153
+ SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
154
+ SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
155
+ SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
156
+ SELECT COUNT( * ) FROM ( SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ) AS DERIVED_TABLEalias0 ;
157
+ SELECT COUNT( * ) FROM ( SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ) AS DERIVED_TABLEalias0 ;
158
+ SELECT COUNT( * ) FROM ( SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
159
+ SELECT COUNT( * ) FROM ( SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
160
+ SELECT COUNT( * ) FROM ( SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
161
+ SELECT COUNT( * ) FROM ( SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ) AS DERIVED_TABLEalias0 ;
162
+ SELECT COUNT( * ) FROM ( SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ) AS DERIVED_TABLEalias0 ;
163
+ SELECT COUNT( * ) FROM ( SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ) AS DERIVED_TABLEalias0 ;
164
+ SELECT COUNT( * ) FROM ( SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ) AS DERIVED_TABLEalias0 ;
165
+ SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
166
+ SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
167
+ SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY KEYWORDalias0.KEYWORD ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
168
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY SUM( PUBLICATIONalias0.CITATION_NUM ) DESC LIMIT 1 ;
169
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY SUM( PUBLICATIONalias0.CITATION_NUM ) DESC LIMIT 1 ;
170
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , AUTHOR AS AUTHORalias1 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 , WRITES AS WRITESalias1 WHERE AUTHORalias0.NAME = "Divesh Srivastava" AND AUTHORalias1.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias0.PID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
171
+ SELECT CONFERENCEalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
172
+ SELECT CONFERENCEalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY CONFERENCEalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
173
+ SELECT JOURNALalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
174
+ SELECT JOURNALalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY JOURNALalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
175
+ SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
176
+ SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
177
+ SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
178
+ SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
179
+ SELECT PUBLICATIONalias0.TITLE FROM AUTHOR AS AUTHORalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
180
+ SELECT PUBLICATIONalias0.TITLE FROM PUBLICATION AS PUBLICATIONalias0 WHERE PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
181
+ SELECT PUBLICATIONalias0.TITLE FROM DOMAIN AS DOMAINalias0 , DOMAIN_PUBLICATION AS DOMAIN_PUBLICATIONalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE DOMAINalias0.DID = DOMAIN_PUBLICATIONalias0.DID AND DOMAINalias0.NAME = "Databases" AND PUBLICATIONalias0.PID = DOMAIN_PUBLICATIONalias0.PID AND PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
182
+ SELECT PUBLICATIONalias0.TITLE FROM JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
183
+ SELECT PUBLICATIONalias0.TITLE FROM CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.YEAR > 2000 ORDER BY PUBLICATIONalias0.CITATION_NUM DESC LIMIT 1 ;
184
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
185
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , JOURNAL AS JOURNALalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
186
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
187
+ SELECT CONFERENCEalias0.NAME FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY CONFERENCEalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ;
188
+ SELECT JOURNALalias0.NAME FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE KEYWORDalias0.KEYWORD = "Relational Database" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY JOURNALalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 60 ;
189
+ SELECT KEYWORDalias0.KEYWORD FROM CONFERENCE AS CONFERENCEalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ;
190
+ SELECT KEYWORDalias0.KEYWORD FROM JOURNAL AS JOURNALalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 WHERE JOURNALalias0.NAME = "PVLDB" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.JID = JOURNALalias0.JID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 100 ;
191
+ SELECT KEYWORDalias0.KEYWORD FROM AUTHOR AS AUTHORalias0 , KEYWORD AS KEYWORDalias0 , PUBLICATION AS PUBLICATIONalias0 , PUBLICATION_KEYWORD AS PUBLICATION_KEYWORDalias0 , WRITES AS WRITESalias0 WHERE AUTHORalias0.NAME = "H. V. Jagadish" AND PUBLICATION_KEYWORDalias0.KID = KEYWORDalias0.KID AND PUBLICATIONalias0.PID = PUBLICATION_KEYWORDalias0.PID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY KEYWORDalias0.KEYWORD HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
192
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) > 10 ;
193
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ORDER BY COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) DESC LIMIT 1 ;
194
+ SELECT DERIVED_FIELDalias0 FROM ( SELECT AUTHORalias0.NAME AS DERIVED_FIELDalias0 , COUNT( DISTINCT ( PUBLICATIONalias0.TITLE ) ) AS DERIVED_FIELDalias1 FROM AUTHOR AS AUTHORalias0 , CONFERENCE AS CONFERENCEalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE CONFERENCEalias0.NAME = "VLDB" AND PUBLICATIONalias0.CID = CONFERENCEalias0.CID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME ) AS DERIVED_TABLEalias0 , ( SELECT AUTHORalias1.NAME AS DERIVED_FIELDalias2 , COUNT( DISTINCT ( PUBLICATIONalias1.TITLE ) ) AS DERIVED_FIELDalias3 FROM AUTHOR AS AUTHORalias1 , CONFERENCE AS CONFERENCEalias1 , PUBLICATION AS PUBLICATIONalias1 , WRITES AS WRITESalias1 WHERE CONFERENCEalias1.NAME = "ICDE" AND PUBLICATIONalias1.CID = CONFERENCEalias1.CID AND WRITESalias1.AID = AUTHORalias1.AID AND WRITESalias1.PID = PUBLICATIONalias1.PID GROUP BY AUTHORalias1.NAME ) AS DERIVED_TABLEalias1 WHERE DERIVED_TABLEalias0.DERIVED_FIELDalias1 > DERIVED_TABLEalias1.DERIVED_FIELDalias3 AND DERIVED_TABLEalias1.DERIVED_FIELDalias2 = DERIVED_TABLEalias0.DERIVED_FIELDalias0 ;
195
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING SUM( PUBLICATIONalias0.CITATION_NUM ) > 5000 ;
196
+ SELECT AUTHORalias0.NAME FROM AUTHOR AS AUTHORalias0 , DOMAIN AS DOMAINalias0 , DOMAIN_AUTHOR AS DOMAIN_AUTHORalias0 , ORGANIZATION AS ORGANIZATIONalias0 , PUBLICATION AS PUBLICATIONalias0 , WRITES AS WRITESalias0 WHERE DOMAIN_AUTHORalias0.AID = AUTHORalias0.AID AND DOMAINalias0.DID = DOMAIN_AUTHORalias0.DID AND DOMAINalias0.NAME = "Databases" AND ORGANIZATIONalias0.NAME = "University of Michigan" AND ORGANIZATIONalias0.OID = AUTHORalias0.OID AND WRITESalias0.AID = AUTHORalias0.AID AND WRITESalias0.PID = PUBLICATIONalias0.PID GROUP BY AUTHORalias0.NAME HAVING SUM( PUBLICATIONalias0.CITATION_NUM ) > 5000 ;
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/classical_test_gold.txt ADDED
The diff for this file is too large to render. See raw diff
 
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/gold.txt ADDED
@@ -0,0 +1,453 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SELECT * FROM AIRLINES flight_2
2
+ SELECT * FROM AIRLINES WHERE Airline = "JetBlue Airways" flight_2
3
+ SELECT Country FROM AIRLINES WHERE Airline = "JetBlue Airways" flight_2
4
+
5
+ SELECT Abbreviation FROM AIRLINES flight_2
6
+ SELECT Abbreviation FROM AIRLINES WHERE Airline = "JetBlue Airways" flight_2
7
+
8
+ SELECT Airline , Abbreviation FROM AIRLINES flight_2
9
+ SELECT Airline , Abbreviation FROM AIRLINES WHERE Country = "USA" flight_2
10
+
11
+ SELECT * FROM AIRPORTS WHERE city = "Anthony" flight_2
12
+ SELECT AirportCode , AirportName FROM AIRPORTS WHERE city = "Anthony" flight_2
13
+
14
+ SELECT * FROM AIRLINES flight_2
15
+ SELECT count(*) FROM AIRLINES flight_2
16
+
17
+ SELECT * FROM AIRPORTS flight_2
18
+ SELECT count(*) FROM AIRPORTS flight_2
19
+
20
+ SELECT * FROM FLIGHTS flight_2
21
+ SELECT count(*) FROM FLIGHTS flight_2
22
+
23
+ SELECT Airline FROM AIRLINES flight_2
24
+ SELECT Airline FROM AIRLINES WHERE Abbreviation = "UAL" flight_2
25
+
26
+ SELECT airline FROM AIRLINES WHERE Country = "USA" flight_2
27
+ SELECT count(*) FROM AIRLINES WHERE Country = "USA" flight_2
28
+
29
+ SELECT City , Country FROM AIRPORTS flight_2
30
+ SELECT City , Country FROM AIRPORTS WHERE AirportName = "Alton" flight_2
31
+
32
+ SELECT AirportName FROM AIRPORTS flight_2
33
+ SELECT AirportName FROM AIRPORTS WHERE AirportCode = "AKO" flight_2
34
+
35
+ SELECT AirportName FROM AIRPORTS flight_2
36
+ SELECT AirportName FROM AIRPORTS WHERE City = "Aberdeen" flight_2
37
+
38
+ SELECT * FROM FLIGHTS WHERE SourceAirport = "APG" flight_2
39
+ SELECT count(*) FROM FLIGHTS WHERE SourceAirport = "APG" flight_2
40
+
41
+ SELECT * FROM FLIGHTS WHERE DestAirport = "ATO" flight_2
42
+ SELECT count(*) FROM FLIGHTS WHERE DestAirport = "ATO" flight_2
43
+
44
+ SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport = T2.AirportCode WHERE T2.City = "Aberdeen" flight_2
45
+ SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport = T2.AirportCode WHERE T2.City = "Aberdeen" flight_2
46
+
47
+ SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode WHERE T2.City = "Aberdeen" flight_2
48
+ SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode WHERE T2.City = "Aberdeen" flight_2
49
+
50
+ SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport = T2.AirportCode WHERE T2.City = "Aberdeen" flight_2
51
+ SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode JOIN AIRPORTS AS T3 ON T1.SourceAirport = T3.AirportCode WHERE T2.City = "Ashley" AND T3.City = "Aberdeen" flight_2
52
+ SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode JOIN AIRPORTS AS T3 ON T1.SourceAirport = T3.AirportCode WHERE T2.City = "Ashley" AND T3.City = "Aberdeen" flight_2
53
+
54
+ SELECT * FROM FLIGHTS AS T1 JOIN AIRLINES AS T2 ON T1.Airline = T2.uid WHERE T2.Airline = "JetBlue Airways" flight_2
55
+ SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRLINES AS T2 ON T1.Airline = T2.uid WHERE T2.Airline = "JetBlue Airways" flight_2
56
+
57
+ SELECT * FROM AIRLINES WHERE Airline = "United Airlines" flight_2
58
+ SELECT count(*) FROM AIRLINES WHERE Airline = "United Airlines" flight_2
59
+ SELECT count(*) FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T2.Airline = T1.uid WHERE T1.Airline = "United Airlines" AND T2.DestAirport = "ASY" flight_2
60
+
61
+ SELECT * FROM AIRLINES WHERE Airline = "United Airlines" flight_2
62
+ SELECT * FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T2.Airline = T1.uid WHERE T1.Airline = "United Airlines" AND T2.SourceAirport = "AHD" flight_2
63
+ SELECT count(*) FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T2.Airline = T1.uid WHERE T1.Airline = "United Airlines" AND T2.SourceAirport = "AHD" flight_2
64
+
65
+ SELECT * FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode JOIN AIRLINES AS T3 ON T3.uid = T1.Airline WHERE T2.City = "Aberdeen" AND T3.Airline = "United Airlines" flight_2
66
+ SELECT count(*) FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode JOIN AIRLINES AS T3 ON T3.uid = T1.Airline WHERE T2.City = "Aberdeen" AND T3.Airline = "United Airlines" flight_2
67
+
68
+ SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport flight_2
69
+ SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport GROUP BY T1.City ORDER BY count(*) DESC flight_2
70
+ SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport GROUP BY T1.City ORDER BY count(*) DESC LIMIT 1 flight_2
71
+
72
+ SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.SourceAirport flight_2
73
+ SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.SourceAirport GROUP BY T1.City ORDER BY count(*) DESC flight_2
74
+ SELECT T1.City FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.SourceAirport GROUP BY T1.City ORDER BY count(*) DESC LIMIT 1 flight_2
75
+
76
+ SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport flight_2
77
+ SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport OR T1.AirportCode = T2.SourceAirport flight_2
78
+ SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport OR T1.AirportCode = T2.SourceAirport GROUP BY T1.AirportCode ORDER BY count(*) DESC LIMIT 1 flight_2
79
+
80
+ SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport flight_2
81
+ SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport OR T1.AirportCode = T2.SourceAirport flight_2
82
+ SELECT T1.AirportCode FROM AIRPORTS AS T1 JOIN FLIGHTS AS T2 ON T1.AirportCode = T2.DestAirport OR T1.AirportCode = T2.SourceAirport GROUP BY T1.AirportCode ORDER BY count(*) LIMIT 1 flight_2
83
+
84
+ SELECT count(*) , T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline GROUP BY T1.Airline flight_2
85
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline GROUP BY T1.Airline ORDER BY count(*) DESC LIMIT 1 flight_2
86
+
87
+ SELECT Abbreviation , Country FROM AIRLINES flight_2
88
+ SELECT T1.Abbreviation , T1.Country FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline GROUP BY T1.Airline ORDER BY count(*) flight_2
89
+ SELECT T1.Abbreviation , T1.Country FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline GROUP BY T1.Airline ORDER BY count(*) LIMIT 1 flight_2
90
+
91
+ SELECT * FROM FLIGHTS WHERE SourceAirport = "AHD" flight_2
92
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.SourceAirport = "AHD" flight_2
93
+
94
+ SELECT * FROM FLIGHTS WHERE DestAirport = "AHD" flight_2
95
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.DestAirport = "AHD" flight_2
96
+
97
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.SourceAirport = "APG" flight_2
98
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.SourceAirport = "APG" INTERSECT SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.SourceAirport = "CVO" flight_2
99
+
100
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.SourceAirport = "CVO" flight_2
101
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.SourceAirport = "CVO" EXCEPT SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline WHERE T2.SourceAirport = "APG" flight_2
102
+
103
+ SELECT DISTINCT Airline FROM AIRLINES flight_2
104
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline GROUP BY T1.Airline HAVING count(*) > 10 flight_2
105
+
106
+ SELECT DISTINCT Airline FROM AIRLINES flight_2
107
+ SELECT T1.Airline FROM AIRLINES AS T1 JOIN FLIGHTS AS T2 ON T1.uid = T2.Airline GROUP BY T1.Airline HAVING count(*) < 200 flight_2
108
+
109
+ SELECT FlightNo FROM FLIGHTS flight_2
110
+ SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRLINES AS T2 ON T2.uid = T1.Airline WHERE T2.Airline = "United Airlines" flight_2
111
+
112
+ SELECT FlightNo FROM FLIGHTS flight_2
113
+ SELECT FlightNo FROM FLIGHTS WHERE SourceAirport = "APG" flight_2
114
+
115
+ SELECT FlightNo FROM FLIGHTS flight_2
116
+ SELECT FlightNo FROM FLIGHTS WHERE DestAirport = "APG" flight_2
117
+
118
+ SELECT FlightNo FROM FLIGHTS flight_2
119
+ SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport = T2.AirportCode flight_2
120
+ SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.SourceAirport = T2.AirportCode WHERE T2.City = "Aberdeen" flight_2
121
+
122
+ SELECT FlightNo FROM FLIGHTS flight_2
123
+ SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode flight_2
124
+ SELECT T1.FlightNo FROM FLIGHTS AS T1 JOIN AIRPORTS AS T2 ON T1.DestAirport = T2.AirportCode WHERE T2.City = "Aberdeen" flight_2
125
+
126
+ SELECT * FROM Flights AS T1 JOIN Airports AS T2 ON T1.DestAirport = T2.AirportCode WHERE T2.city = "Aberdeen" flight_2
127
+ SELECT * FROM Flights AS T1 JOIN Airports AS T2 ON T1.DestAirport = T2.AirportCode WHERE T2.city = "Aberdeen" OR T2.city = "Abilene" flight_2
128
+ SELECT count(*) FROM Flights AS T1 JOIN Airports AS T2 ON T1.DestAirport = T2.AirportCode WHERE T2.city = "Aberdeen" OR T2.city = "Abilene" flight_2
129
+
130
+ SELECT SourceAirport FROM Flights flight_2
131
+ SELECT SourceAirport FROM Flights UNION SELECT DestAirport FROM Flights flight_2
132
+ SELECT AirportName FROM Airports WHERE AirportCode NOT IN (SELECT SourceAirport FROM Flights UNION SELECT DestAirport FROM Flights) flight_2
133
+
134
+ SELECT * FROM pets pets_1
135
+ SELECT * FROM pets WHERE weight > 10 pets_1
136
+ SELECT count(*) FROM pets WHERE weight > 10 pets_1
137
+
138
+ SELECT * FROM pets ORDER BY pet_age pets_1
139
+ SELECT weight FROM pets ORDER BY pet_age pets_1
140
+ SELECT weight FROM pets ORDER BY pet_age LIMIT 1 pets_1
141
+
142
+ SELECT DISTINCT petType FROM pets pets_1
143
+ SELECT max(weight) , petType FROM pets GROUP BY petType pets_1
144
+
145
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid WHERE T1.age > 20 pets_1
146
+ SELECT count(*) FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid WHERE T1.age > 20 pets_1
147
+
148
+ SELECT * FROM student WHERE sex = 'F' pets_1
149
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid WHERE T1.sex = 'F' pets_1
150
+ SELECT count(*) FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T2.petid = T3.petid WHERE T1.sex = 'F' AND T3.pettype = 'dog' pets_1
151
+
152
+ SELECT DISTINCT pettype FROM pets pets_1
153
+ SELECT count(DISTINCT pettype) FROM pets pets_1
154
+
155
+ SELECT DISTINCT T1.Fname FROM student AS T1 pets_1
156
+ SELECT DISTINCT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' pets_1
157
+ SELECT DISTINCT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' OR T3.pettype = 'dog' pets_1
158
+
159
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'dog' pets_1
160
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' INTERSECT SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'dog' pets_1
161
+ SELECT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' INTERSECT SELECT T1.Fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'dog' pets_1
162
+
163
+ SELECT * FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat') pets_1
164
+ SELECT major FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat') pets_1
165
+ SELECT major , age FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat') pets_1
166
+
167
+ SELECT stuid FROM student pets_1
168
+ SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' pets_1
169
+ SELECT stuid FROM student EXCEPT SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' pets_1
170
+
171
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'dog' pets_1
172
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'dog' EXCEPT SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' pets_1
173
+ SELECT T1.fname , T1.age FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'dog' EXCEPT SELECT T1.fname , T1.age FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat' pets_1
174
+
175
+ SELECT * FROM pets ORDER BY pet_age LIMIT 1 pets_1
176
+ SELECT pettype FROM pets ORDER BY pet_age LIMIT 1 pets_1
177
+ SELECT pettype , weight FROM pets ORDER BY pet_age LIMIT 1 pets_1
178
+
179
+ SELECT petid FROM pets pets_1
180
+ SELECT petid FROM pets WHERE pet_age > 1 pets_1
181
+ SELECT petid , weight FROM pets WHERE pet_age > 1 pets_1
182
+
183
+ SELECT DISTINCT pettype FROM pets pets_1
184
+ SELECT max(pet_age) , pettype FROM pets GROUP BY pettype pets_1
185
+ SELECT avg(pet_age) , pettype FROM pets GROUP BY pettype pets_1
186
+
187
+ SELECT * FROM pets pets_1
188
+ SELECT avg(weight) , pettype FROM pets GROUP BY pettype pets_1
189
+
190
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid pets_1
191
+ SELECT DISTINCT T1.fname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid pets_1
192
+ SELECT DISTINCT T1.fname , T1.age FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid pets_1
193
+
194
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid pets_1
195
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid WHERE T1.Lname = 'Smith' pets_1
196
+ SELECT T2.petid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid WHERE T1.Lname = 'Smith' pets_1
197
+
198
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid pets_1
199
+ SELECT count(*) , T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid GROUP BY T1.stuid pets_1
200
+
201
+ SELECT T1.fname , T1.sex FROM student AS T1 pets_1
202
+ SELECT T1.fname , T1.sex FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid GROUP BY T1.stuid HAVING count(*) > 1 pets_1
203
+
204
+ SELECT petid FROM pets WHERE pet_age = 3 AND pettype = 'cat' pets_1
205
+ SELECT * FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pet_age = 3 AND T3.pettype = 'cat' pets_1
206
+ SELECT T1.lname FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pet_age = 3 AND T3.pettype = 'cat' pets_1
207
+
208
+ SELECT * FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid) pets_1
209
+ SELECT avg(age) FROM student WHERE stuid NOT IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid) pets_1
210
+
211
+ SELECT Name FROM country world_1
212
+ SELECT Name FROM country WHERE IndepYear > 1950 world_1
213
+
214
+ SELECT count(*) FROM country world_1
215
+ SELECT count(*) FROM country WHERE GovernmentForm = "Republic" world_1
216
+
217
+ SELECT * FROM country WHERE Region = "Caribbean" world_1
218
+ SELECT SurfaceArea FROM country WHERE Region = "Caribbean" world_1
219
+ SELECT sum(SurfaceArea) FROM country WHERE Region = "Caribbean" world_1
220
+
221
+ SELECT Continent FROM country world_1
222
+ SELECT Continent FROM country WHERE Name = "Anguilla" world_1
223
+
224
+ SELECT Region FROM country world_1
225
+ SELECT Region FROM country AS T1 JOIN city AS T2 ON T1.Code = T2.CountryCode WHERE T2.Name = "Kabul" world_1
226
+
227
+ SELECT LANGUAGE FROM countrylanguage world_1
228
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Name = "Aruba" world_1
229
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Name = "Aruba" ORDER BY Percentage DESC LIMIT 1 world_1
230
+
231
+ SELECT Population , LifeExpectancy FROM country world_1
232
+ SELECT Population , LifeExpectancy FROM country WHERE Name = "Brazil" world_1
233
+
234
+ SELECT Region FROM country WHERE Name = "Angola" world_1
235
+ SELECT Population FROM country WHERE Name = "Angola" world_1
236
+
237
+ SELECT LifeExpectancy FROM country world_1
238
+ SELECT LifeExpectancy FROM country WHERE Region = "Central Africa" world_1
239
+ SELECT avg(LifeExpectancy) FROM country WHERE Region = "Central Africa" world_1
240
+
241
+ SELECT Name FROM country WHERE Continent = "Asia" world_1
242
+ SELECT Name FROM country WHERE Continent = "Asia" ORDER BY LifeExpectancy LIMIT 1 world_1
243
+
244
+ SELECT sum(Population) FROM country WHERE Continent = "Asia" world_1
245
+ SELECT max(GNP) FROM country WHERE Continent = "Asia" world_1
246
+
247
+ SELECT * FROM country WHERE Continent = "Africa" world_1
248
+ SELECT * FROM country WHERE Continent = "Africa" AND GovernmentForm = "Republic" world_1
249
+ SELECT avg(LifeExpectancy) FROM country WHERE Continent = "Africa" AND GovernmentForm = "Republic" world_1
250
+
251
+ SELECT * FROM country WHERE Continent = "Asia" OR Continent = "Europe" world_1
252
+ SELECT SurfaceArea FROM country WHERE Continent = "Asia" OR Continent = "Europe" world_1
253
+ SELECT sum(SurfaceArea) FROM country WHERE Continent = "Asia" OR Continent = "Europe" world_1
254
+
255
+ SELECT Population FROM city WHERE District = "Gelderland" world_1
256
+ SELECT sum(Population) FROM city WHERE District = "Gelderland" world_1
257
+
258
+ SELECT * FROM country world_1
259
+ SELECT * FROM country WHERE GovernmentForm = "US Territory" world_1
260
+ SELECT avg(GNP) , sum(population) FROM country WHERE GovernmentForm = "US Territory" world_1
261
+
262
+ SELECT DISTINCT LANGUAGE FROM countrylanguage world_1
263
+ SELECT count(DISTINCT LANGUAGE) FROM countrylanguage world_1
264
+
265
+ SELECT DISTINCT GovernmentForm FROM country WHERE Continent = "Africa" world_1
266
+ SELECT count(DISTINCT GovernmentForm) FROM country WHERE Continent = "Africa" world_1
267
+
268
+ SELECT * FROM country WHERE Name = "Aruba" world_1
269
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Name = "Aruba" world_1
270
+ SELECT COUNT(T2.Language) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Name = "Aruba" world_1
271
+
272
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Name = "Afghanistan" world_1
273
+ SELECT COUNT(*) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Name = "Afghanistan" AND IsOfficial = "T" world_1
274
+
275
+ SELECT count(*) , T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode GROUP BY T1.Name world_1
276
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode GROUP BY T1.Name ORDER BY COUNT(*) DESC LIMIT 1 world_1
277
+
278
+ SELECT COUNT(*) , T1.Continent FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode GROUP BY T1.Continent world_1
279
+ SELECT T1.Continent FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode GROUP BY T1.Continent ORDER BY COUNT(*) DESC LIMIT 1 world_1
280
+
281
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" world_1
282
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "Dutch" world_1
283
+ SELECT COUNT(*) FROM (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "Dutch") world_1
284
+
285
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" world_1
286
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "French" world_1
287
+
288
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.IsOfficial = "T" world_1
289
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" AND T2.IsOfficial = "T" world_1
290
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" AND T2.IsOfficial = "T" INTERSECT SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "French" AND T2.IsOfficial = "T" world_1
291
+
292
+ SELECT T1.name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "Chinese" world_1
293
+ SELECT DISTINCT T1.Continent FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "Chinese" world_1
294
+ SELECT COUNT( DISTINCT Continent) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "Chinese" world_1
295
+
296
+ SELECT DISTINCT Region FROM country world_1
297
+ SELECT DISTINCT T1.Region FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" OR T2.Language = "Dutch" world_1
298
+
299
+ SELECT T2.Language , T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE IsOfficial = "T" world_1
300
+ SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" AND IsOfficial = "T" UNION SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "Dutch" AND IsOfficial = "T" world_1
301
+
302
+ SELECT DISTINCT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Continent = "Asia" world_1
303
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.Continent = "Asia" GROUP BY T2.Language ORDER BY COUNT (*) DESC LIMIT 1 world_1
304
+
305
+ SELECT * FROM country WHERE GovernmentForm = "Republic" world_1
306
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.GovernmentForm = "Republic" GROUP BY T2.Language HAVING COUNT(*) = 1 world_1
307
+
308
+ SELECT T1.Name FROM city AS T1 JOIN countrylanguage AS T2 ON T1.CountryCode = T2.CountryCode WHERE T2.Language = "English" world_1
309
+ SELECT T1.Name , T1.Population FROM city AS T1 JOIN countrylanguage AS T2 ON T1.CountryCode = T2.CountryCode WHERE T2.Language = "English" ORDER BY T1.Population DESC LIMIT 1 world_1
310
+
311
+ SELECT Name , Population , LifeExpectancy FROM country WHERE Continent = "Asia" world_1
312
+ SELECT Name , Population , LifeExpectancy FROM country WHERE Continent = "Asia" ORDER BY SurfaceArea DESC LIMIT 1 world_1
313
+
314
+ SELECT T2.Language , T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.IsOfficial = "T" world_1
315
+ SELECT * FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" AND T2.IsOfficial = "T") world_1
316
+ SELECT avg(LifeExpectancy) FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English" AND T2.IsOfficial = "T") world_1
317
+
318
+ SELECT Name FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English") world_1
319
+ SELECT sum(Population) FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = "English") world_1
320
+
321
+ SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.HeadOfState = "Beatrix" world_1
322
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.HeadOfState = "Beatrix" world_1
323
+ SELECT T2.Language FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T1.HeadOfState = "Beatrix" AND T2.IsOfficial = "T" world_1
324
+
325
+ SELECT T1.Name FROM country AS t1 world_1
326
+ SELECT T1.Name FROM country AS t1 WHERE IndepYear < 1930 world_1
327
+ SELECT count(DISTINCT T2.Language) FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE IndepYear < 1930 AND T2.IsOfficial = "T" world_1
328
+
329
+ SELECT * FROM country WHERE Continent = "Europe" world_1
330
+ SELECT min(SurfaceArea) FROM country WHERE Continent = "Europe" world_1
331
+ SELECT Name FROM country WHERE SurfaceArea > (SELECT min(SurfaceArea) FROM country WHERE Continent = "Europe") world_1
332
+
333
+ SELECT min(population) FROM country WHERE Continent = "Asia" world_1
334
+ SELECT Name FROM country WHERE Continent = "Africa" AND population < (SELECT max(population) FROM country WHERE Continent = "Asia") world_1
335
+
336
+ SELECT min(population) FROM country WHERE Continent = "Africa" world_1
337
+ SELECT Name FROM country WHERE Continent = "Asia" AND population > (SELECT min(population) FROM country WHERE Continent = "Africa") world_1
338
+
339
+ SELECT CountryCode FROM countrylanguage world_1
340
+ SELECT CountryCode FROM countrylanguage EXCEPT SELECT CountryCode FROM countrylanguage WHERE LANGUAGE = "English" world_1
341
+
342
+ SELECT DISTINCT CountryCode FROM countrylanguage world_1
343
+ SELECT DISTINCT CountryCode FROM countrylanguage WHERE LANGUAGE ! = "English" world_1
344
+
345
+ SELECT Code FROM country WHERE GovernmentForm ! = "Republic" world_1
346
+ SELECT Code FROM country WHERE GovernmentForm ! = "Republic" EXCEPT SELECT CountryCode FROM countrylanguage WHERE LANGUAGE = "English" world_1
347
+
348
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.IsOfficial = 'T' AND T2.Language = 'English' world_1
349
+ SELECT Name FROM country WHERE Continent = 'Europe' AND Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.IsOfficial = 'T' AND T2.Language = 'English') world_1
350
+ SELECT DISTINCT T2.Name FROM country AS T1 JOIN city AS T2 ON T2.CountryCode = T1.Code WHERE T1.Continent = 'Europe' AND T1.Name NOT IN (SELECT T3.Name FROM country AS T3 JOIN countrylanguage AS T4 ON T3.Code = T4.CountryCode WHERE T4.IsOfficial = 'T' AND T4.Language = 'English') world_1
351
+
352
+ SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.Language = 'Chinese' AND T1.Continent = "Asia" world_1
353
+ SELECT * FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode WHERE T2.IsOfficial = 'T' AND T2.Language = 'Chinese' AND T1.Continent = "Asia" world_1
354
+ SELECT DISTINCT T3.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode JOIN city AS T3 ON T1.Code = T3.CountryCode WHERE T2.IsOfficial = 'T' AND T2.Language = 'Chinese' AND T1.Continent = "Asia" world_1
355
+
356
+ SELECT * FROM country ORDER BY Population LIMIT 1 world_1
357
+ SELECT Name , SurfaceArea , IndepYear FROM country ORDER BY Population LIMIT 1 world_1
358
+
359
+ SELECT * FROM country ORDER BY SurfaceArea DESC LIMIT 1 world_1
360
+ SELECT Name , population , HeadOfState FROM country ORDER BY SurfaceArea DESC LIMIT 1 world_1
361
+
362
+ SELECT Name FROM country world_1
363
+ SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode GROUP BY T1.Name HAVING COUNT(*) > 2 world_1
364
+ SELECT COUNT(T2.Language) , T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code = T2.CountryCode GROUP BY T1.Name HAVING COUNT(*) > 2 world_1
365
+
366
+ SELECT avg(Population) FROM city world_1
367
+ SELECT count(*) , District FROM city WHERE Population > (SELECT avg(Population) FROM city) GROUP BY District world_1
368
+
369
+ SELECT * FROM country GROUP BY GovernmentForm HAVING avg(LifeExpectancy) > 72 world_1
370
+ SELECT sum(Population) , GovernmentForm FROM country GROUP BY GovernmentForm HAVING avg(LifeExpectancy) > 72 world_1
371
+
372
+ SELECT Continent FROM country GROUP BY Continent HAVING avg(LifeExpectancy) < 72 world_1
373
+ SELECT sum(Population) , avg(LifeExpectancy) , Continent FROM country GROUP BY Continent HAVING avg(LifeExpectancy) < 72 world_1
374
+
375
+ SELECT * FROM country ORDER BY SurfaceArea DESC LIMIT 5 world_1
376
+ SELECT Name , SurfaceArea FROM country ORDER BY SurfaceArea DESC LIMIT 5 world_1
377
+
378
+ SELECT * FROM country ORDER BY Population DESC world_1
379
+ SELECT Name FROM country ORDER BY Population DESC LIMIT 3 world_1
380
+
381
+ SELECT * FROM country ORDER BY Population world_1
382
+ SELECT Name FROM country ORDER BY Population DESC LIMIT 3 world_1
383
+
384
+ SELECT * FROM country WHERE continent = "Asia" world_1
385
+ SELECT count(*) FROM country WHERE continent = "Asia" world_1
386
+
387
+ SELECT * FROM country WHERE continent = "Europe" world_1
388
+ SELECT Name FROM country WHERE continent = "Europe" AND Population = "80000" world_1
389
+
390
+ SELECT * FROM country WHERE Continent = "North America" world_1
391
+ SELECT * FROM country WHERE Continent = "North America" AND SurfaceArea > 3000 world_1
392
+ SELECT sum(Population) , avg(SurfaceArea) FROM country WHERE Continent = "North America" AND SurfaceArea > 3000 world_1
393
+
394
+ SELECT name FROM city world_1
395
+ SELECT name FROM city WHERE Population BETWEEN 160000 AND 90000 world_1
396
+
397
+ SELECT LANGUAGE FROM countrylanguage world_1
398
+ SELECT LANGUAGE FROM countrylanguage GROUP BY LANGUAGE ORDER BY count(*) DESC LIMIT 1 world_1
399
+
400
+ SELECT Directed_by FROM Cartoon WHERE Title = "Day of the Dark Knight!" tvshow
401
+ SELECT Channel FROM Cartoon WHERE Title = "Day of the Dark Knight!" tvshow
402
+ SELECT Title FROM Cartoon WHERE Directed_by = "Ben Jones" OR Directed_by = "Brandon Vietti" tvshow
403
+
404
+ SELECT * FROM TV_Channel WHERE Country = "Italy" tvshow
405
+ SELECT * FROM TV_Channel WHERE Country = "Poland" tvshow
406
+ SELECT Country , count(*) FROM TV_Channel GROUP BY Country ORDER BY count(*) DESC LIMIT 1 tvshow
407
+
408
+ SELECT Channel FROM Cartoon WHERE Title = "The Eyes of Despero!" tvshow
409
+ SELECT series_name FROM TV_Channel WHERE id IN (SELECT Channel FROM Cartoon WHERE Title = "The Eyes of Despero!") tvshow
410
+ SELECT count(DISTINCT series_name) , count(DISTINCT content) FROM TV_Channel tvshow
411
+
412
+ SELECT Package_Option FROM TV_Channel WHERE series_name = "Rock TV" tvshow
413
+ SELECT Language FROM TV_Channel WHERE series_name = "Rock TV" tvshow
414
+ SELECT LANGUAGE , count(*) FROM TV_Channel GROUP BY LANGUAGE ORDER BY count(*) ASC LIMIT 1 tvshow
415
+
416
+ SELECT Written_by FROM Cartoon WHERE Title = "The Rise of the Blue Beetle!" tvshow
417
+ SELECT Directed_by FROM Cartoon WHERE Title = "The Rise of the Blue Beetle!" tvshow
418
+ SELECT T1.series_name FROM TV_Channel AS T1 JOIN Cartoon AS T2 ON T1.id = T2.Channel WHERE T2.Title = "The Rise of the Blue Beetle!" tvshow
419
+
420
+ SELECT Country FROM TV_Channel WHERE series_name = "Sky Radio" tvshow
421
+ SELECT Content FROM TV_Channel WHERE series_name = "Sky Radio" tvshow
422
+ SELECT T2.Title FROM TV_Channel AS T1 JOIN Cartoon AS T2 ON T1.id = T2.Channel WHERE T1.series_name = "Sky Radio" tvshow
423
+
424
+ SELECT Rating FROM TV_series WHERE Episode = "Double Down" tvshow
425
+ SELECT Rating FROM TV_series WHERE Episode = "Keepers" tvshow
426
+ SELECT Episode , Rating FROM TV_series ORDER BY Rating DESC LIMIT 3 tvshow
427
+
428
+ SELECT Weekly_Rank FROM TV_series WHERE Episode = "Emily" tvshow
429
+ SELECT Share FROM TV_series WHERE Episode = "Emily" tvshow
430
+ SELECT max(SHARE) , min(SHARE) FROM TV_series tvshow
431
+
432
+ SELECT Rating FROM TV_series WHERE Episode = "A Love of a Lifetime" tvshow
433
+ SELECT Weekly_Rank FROM TV_series WHERE Episode = "A Love of a Lifetime" tvshow
434
+ SELECT T1.series_name FROM TV_Channel AS T1 JOIN TV_series AS T2 ON T1.id = T2.Channel WHERE T2.Episode = "A Love of a Lifetime" tvshow
435
+
436
+ SELECT Content FROM TV_Channel WHERE series_name = "Sky Radio" tvshow
437
+ SELECT Language FROM TV_Channel WHERE series_name = "Sky Radio" tvshow
438
+ SELECT T2.Episode FROM TV_Channel AS T1 JOIN TV_series AS T2 ON T1.id = T2.Channel WHERE T1.series_name = "Sky Radio" tvshow
439
+
440
+ SELECT Original_air_date FROM Cartoon WHERE Title = "Fall of the Blue Beetle!" tvshow
441
+ SELECT Production_code FROM Cartoon WHERE Title = "Fall of the Blue Beetle!" tvshow
442
+ SELECT production_code , channel FROM cartoon ORDER BY original_air_date LIMIT 1 tvshow
443
+
444
+ SELECT Title FROM Cartoon WHERE Directed_by = "Ben Jones" tvshow
445
+ SELECT Title FROM Cartoon WHERE Written_by = "Todd Casey" tvshow
446
+ SELECT T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.Written_by = 'Todd Casey' tvshow
447
+
448
+ SELECT T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.Written_by = 'Steven Melching' tvshow
449
+ SELECT country FROM TV_Channel EXCEPT SELECT T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.written_by = 'Todd Casey' tvshow
450
+
451
+ SELECT Directed_by FROM Cartoon WHERE Title = "Deep Cover for Batman!" tvshow
452
+ SELECT Production_code FROM Cartoon WHERE Title = "Deep Cover for Batman!" tvshow
453
+ SELECT T1.series_name , T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.directed_by = 'Michael Chang' INTERSECT SELECT T1.series_name , T1.country FROM TV_Channel AS T1 JOIN cartoon AS T2 ON T1.id = T2.Channel WHERE T2.directed_by = 'Ben Jones' tvshow
duckdb-nsql/eval/metrics/test_suite_sql_eval/evaluation_examples/predict.txt ADDED
@@ -0,0 +1,453 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ select * from airlines
2
+ select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
3
+ select Country from airports where Country = 'terminal'
4
+
5
+ select * from airlines
6
+ select AirportName from airports where AirportName = 'terminal'
7
+
8
+ select T1.Airline,T2.AirportName from airlines as T1 join airports as T2
9
+ select * from airlines as T1 join airports as T2 where T1.Airline = 'terminal' and T2.AirportName = 'terminal'
10
+
11
+ select * from airports where AirportName = 'terminal'
12
+ select T1.Airline,T2.AirportName from airlines as T1 join airports as T2
13
+
14
+ select * from airlines
15
+ select count(*) from airports where Country = 'terminal'
16
+
17
+ select * from airlines
18
+ select count(*) from airports where Country = 'terminal'
19
+
20
+ select * from airlines
21
+ select count(*) from airports where Country = 'terminal'
22
+
23
+ select Airline from airlines
24
+ select AirportName from airports where AirportName = 'terminal'
25
+
26
+ select T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
27
+ select count(*) from airports where Country = 'terminal'
28
+
29
+ select Country,City from airports
30
+ select AirportCode,AirportName,City from airports where Country = 'terminal'
31
+
32
+ select AirportName from airports
33
+ select AirportName,AirportCode from airports where AirportName = 'terminal'
34
+
35
+ select AirportName from airports
36
+ select AirportName from airports where AirportName = 'terminal'
37
+
38
+ select * from flights where DestAirport = 'terminal'
39
+ select count(*) from airports where Country = 'terminal'
40
+
41
+ select City from airports where Country = 'terminal'
42
+ select count(*) from airports where Country = 'terminal'
43
+
44
+ select * from flights where SourceAirport = 'terminal'
45
+ select count(*) from airports where Country = 'terminal'
46
+
47
+ select * from airlines
48
+ select count(*) from airports where Country = 'terminal'
49
+
50
+ select * from flights where DestAirport = 'terminal'
51
+ select City,AirportName from airports where Country = 'terminal'
52
+ select count(*) from airports where Country = 'terminal'
53
+
54
+ select * from airlines
55
+ select count(*) from airports where Country = 'terminal'
56
+
57
+ select * from airlines
58
+ select count(*) from airports where Country = 'terminal'
59
+ select count(*) from airports where Country = 'terminal'
60
+
61
+ select * from airlines
62
+ select AirportName from airports where Country = 'terminal'
63
+ select count(*) from airports where Country = 'terminal'
64
+
65
+ select T2.AirportName,T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
66
+ select count(*) from airports where Country = 'terminal'
67
+
68
+ select City from airports
69
+ select count(*) from airlines group by uid
70
+ select Country from airports group by Country order by count(*) desc limit 1
71
+
72
+ select City from airports
73
+ select count(*) from airlines group by uid
74
+ select Country from airports group by Country order by count(*) desc limit 1
75
+
76
+ select AirportCode from airports where AirportName = 'terminal'
77
+ select AirportCode from airports
78
+ select FlightNo,count(*) from flights group by DestAirport order by count(*) desc limit 1
79
+
80
+ select AirportCode from airports where AirportName = 'terminal'
81
+ select AirportCode from airports
82
+ select * from airlines group by uid order by count(*) asc limit 1
83
+
84
+ select count(*) from airlines group by uid
85
+ select Airline from airlines group by uid order by count(*) desc limit 1
86
+
87
+ select T2.Country,T1.Country from airlines as T1 join airports as T2
88
+ select T1.CountryAbbrev,count(*) from airports as T1 join flights as T2 on T1.AirportCode = T2.DestAirport order by T2.FlightNo asc
89
+ select Country from airports group by Country order by count(*) asc limit 1
90
+
91
+ select City from airports where Country = 'terminal'
92
+ select Airline from airlines where Airline = 'terminal'
93
+
94
+ select AirportName from airports where AirportName = 'terminal'
95
+ select Airline from airlines where Airline = 'terminal'
96
+
97
+ select * from airports where AirportName = 'terminal'
98
+ select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
99
+
100
+ select * from airports where AirportName = 'terminal'
101
+ select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
102
+
103
+ select * from airlines
104
+ select Country from airports group by Country having count(*) > 'terminal'
105
+
106
+ select * from airlines
107
+ select Country from airlines where Airline = 'terminal'
108
+
109
+ select FlightNo from flights
110
+ select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
111
+
112
+ select FlightNo from flights
113
+ select T1.Airline,T2.AirportName from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
114
+
115
+ select FlightNo from flights
116
+ select T3.FlightNo,T1.Airline from airlines as T1 join airports as T2 join flights as T3 where T2.AirportName = 'terminal'
117
+
118
+ select FlightNo from flights
119
+ select T2.AirportName,T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
120
+ select AirportName from airports where AirportName = 'terminal'
121
+
122
+ select FlightNo from flights
123
+ select T2.AirportName,T1.Airline from airlines as T1 join airports as T2 where T2.AirportName = 'terminal'
124
+ select City from airports where Country = 'terminal'
125
+
126
+ select City from airports where Country = 'terminal'
127
+ select AirportName from airports where AirportName = 'terminal'
128
+ select count(*) from airports where Country = 'terminal'
129
+
130
+ select AirportName from airports
131
+ select AirportName from airports
132
+ select * from airports where AirportName like 'terminal'
133
+
134
+ select * from Pets where weight = 'terminal'
135
+ select * from Pets group by PetID having count(*) > 'terminal'
136
+ select count(*) from Student where LName = 'terminal'
137
+
138
+ select * from Student where Age = 'terminal'
139
+ select T3.weight,T1.Age from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID group by T1.Sex
140
+ select * from Student order by Age asc limit 1
141
+
142
+ select PetType from Pets
143
+ select PetType,count(*) from Pets group by PetType
144
+
145
+ select * from Student where Age > 'terminal'
146
+ select count(*) from Student where LName = 'terminal'
147
+
148
+ select * from Student where Age = 'terminal'
149
+ select T1.Fname,* from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.pet_age = 'terminal'
150
+ select count(*) from Student where Age > 'terminal'
151
+
152
+ select PetType from Pets
153
+ select count(*) from Student
154
+
155
+ select Fname from Student group by StuID
156
+ select T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.weight = 'terminal'
157
+ select T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetType = 'terminal'
158
+
159
+ select * from Pets where PetType = 'terminal'
160
+ select T1.LName,T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetType = 'terminal'
161
+ select Fname from Student where Sex = 'terminal'
162
+
163
+ select * from Student where Fname = 'terminal'
164
+ select * from Student where Fname = 'terminal'
165
+ select T1.Age,count(*) from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.pet_age = 'terminal'
166
+
167
+ select T1.StuID from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID group by T2.StuID
168
+ select T3.PetID from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T1.Fname = 'terminal'
169
+ select StuID from Student
170
+
171
+ select * from Student
172
+ select T1.Fname,* from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetType = 'terminal'
173
+ select T1.Fname,T1.Age from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.PetID = 'terminal'
174
+
175
+ select * from Student order by Age desc limit 1
176
+ select pet_age,PetType from Pets
177
+ select T2.weight,count(*) from Has_Pet as T1 join Pets as T2 on T1.PetID = T2.PetID group by T1.PetID
178
+
179
+ select T3.PetID from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T1.Fname = 'terminal'
180
+ select weight from Pets where pet_age > 'terminal'
181
+ select T2.weight,count(*) from Has_Pet as T1 join Pets as T2 on T1.PetID = T2.PetID group by T1.PetID
182
+
183
+ select PetType from Pets
184
+ select Age,count(*) from Student group by Sex
185
+ select avg(Age) from Student
186
+
187
+ select * from Pets where weight = 'terminal'
188
+ select avg(weight),PetType from Pets group by PetType
189
+
190
+ select * from Student
191
+ select Fname from Student
192
+ select Fname,Age from Student
193
+
194
+ select * from Student
195
+ select Fname from Student where LName = 'terminal' and Fname = 'terminal'
196
+ select Sex from Student where LName = 'terminal'
197
+
198
+ select * from Student
199
+ select LName,count(*) from Student group by StuID
200
+
201
+ select LName,Fname from Student
202
+ select T1.Fname from Student as T1 join Has_Pet as T2 on T1.StuID = T2.StuID join Pets as T3 on T2.PetID = T3.PetID where T3.weight > 'terminal'
203
+
204
+ select StuID from Student where Sex = 'terminal'
205
+ select * from Student where Fname = 'terminal'
206
+ select LName from Student where Sex = 'terminal'
207
+
208
+ select * from Pets where PetID not in (select PetID from Pets)
209
+ select avg(Age) from Student where LName = 'terminal' and Fname = 'terminal'
210
+
211
+ select Name from country
212
+ select T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.IndepYear > 'terminal'
213
+
214
+ select count(*) from city
215
+ select Code2,count(*) from country where Population > 'terminal'
216
+
217
+ select Region from country where Region = 'terminal'
218
+ select T2.Region,T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code group by T2.Region
219
+ select SurfaceArea from country where SurfaceArea > (select avg(SurfaceArea) from country)
220
+
221
+ select Region,Continent from country group by Region
222
+ select T1.Name,T2.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
223
+
224
+ select Name from country
225
+ select T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
226
+
227
+ select Language from countrylanguage
228
+ select Language from countrylanguage where Language = 'terminal'
229
+ select GNPOld from country group by GNP order by count(*) desc limit 1
230
+
231
+ select LifeExpectancy,Population from country
232
+ select Name from country where Region = 'terminal'
233
+
234
+ select T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
235
+ select T1.Population,T2.Population from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
236
+
237
+ select Population from country
238
+ select GNP,Continent from country where Name = 'terminal'
239
+ select avg(T3.Percentage) from city as T1 join country as T2 on T1.CountryCode = T2.Code join countrylanguage as T3 on T2.Code = T3.CountryCode where T1.District = 'terminal' and T2.GovernmentForm = 'terminal'
240
+
241
+ select Name from country where Region = 'terminal'
242
+ select Population from country order by Population asc limit 1
243
+
244
+ select count(*) from city where District = 'terminal'
245
+ select count(*) from city group by ID order by count(*) desc limit 1
246
+
247
+ select Continent from country where Region = 'terminal'
248
+ select HeadOfState from country where Region = 'terminal' intersect select Region from country where Region = 'terminal'
249
+ select avg(T3.Percentage) from city as T1 join country as T2 on T1.CountryCode = T2.Code join countrylanguage as T3 on T2.Code = T3.CountryCode where T2.Name = 'terminal' and T1.Name = 'terminal'
250
+
251
+ select Region from country where Population > 'terminal' and Population >= 'terminal'
252
+ select SurfaceArea,Region from country group by Region
253
+ select sum(SurfaceArea) from country where SurfaceArea = 'terminal'
254
+
255
+ select T2.HeadOfState,T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
256
+ select sum(T1.Population) from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Name = 'terminal'
257
+
258
+ select Language from countrylanguage
259
+ select * from country where Name = 'terminal'
260
+ select avg(Population) from country where Name = 'terminal'
261
+
262
+ select Language from countrylanguage
263
+ select count(*) from country where Population > 'terminal'
264
+
265
+ select HeadOfState from country where Region = 'terminal'
266
+ select count(*) from country where Population > 'terminal'
267
+
268
+ select * from countrylanguage where Language = 'terminal'
269
+ select Language from countrylanguage where Language = 'terminal'
270
+ select count(*) from country where Population > 'terminal'
271
+
272
+ select Language from countrylanguage where Language = 'terminal'
273
+ select count(*),count(T2.CountryCode) from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode where T1.IndepYear = 'terminal' and T1.Population > 'terminal'
274
+
275
+ select count(CountryCode),count(*) from countrylanguage group by CountryCode
276
+ select Continent from country group by Region order by count(*) desc limit 1
277
+
278
+ select count(CountryCode),count(*) from countrylanguage group by CountryCode
279
+ select T2.Language,T1.GNPOld from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode group by T1.GNP order by count(*) desc limit 1
280
+
281
+ select Continent from country where Name = 'terminal'
282
+ select T1.Name,T2.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
283
+ select count(*) from country where Population > 'terminal'
284
+
285
+ select Name from country where Region = 'terminal'
286
+ select Name from country where Region = 'terminal' intersect select Name from country where Region = 'terminal'
287
+
288
+ select Language from countrylanguage group by Language
289
+ select Name from country where Region = 'terminal'
290
+ select name from sqlite_sequence where name = 'terminal' intersect select name from sqlite_sequence where name = 'terminal'
291
+
292
+ select Name from country where Region = 'terminal'
293
+ select CountryCode from city
294
+ select count(*) from country where Population > 'terminal'
295
+
296
+ select Region from country
297
+ select Name from city where Population = 'terminal'
298
+
299
+ select T1.Name,T2.Language from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode group by T1.Name
300
+ select Name from country where Code like 'terminal' and Code = 'terminal'
301
+
302
+ select Language from countrylanguage where Language = 'terminal'
303
+ select Continent from country group by Region order by count(*) asc limit 1
304
+
305
+ select T2.HeadOfState from city as T1 join country as T2 on T1.CountryCode = T2.Code where T1.District = 'terminal'
306
+ select Continent from country group by Region
307
+
308
+ select T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
309
+ select Name from country order by Population desc limit 1
310
+
311
+ select T1.Population,T2.LifeExpectancy,T2.Population from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
312
+ select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
313
+
314
+ select Language from countrylanguage group by Language
315
+ select count(*) from country where HeadOfState != 'terminal'
316
+ select Continent,avg(Population) from country where Region = 'terminal'
317
+
318
+ select Name from country where Region != 'terminal'
319
+ select count(*) from country where Population > 'terminal'
320
+
321
+ select T1.District from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
322
+ select T2.Language from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode where T1.Population = 'terminal'
323
+ select T1.name from sqlite_sequence as T1 join country as T2 where T2.Population = 'terminal'
324
+
325
+ select Name from country
326
+ select Name from country where Capital > 'terminal'
327
+ select count(T3.CountryCode),T1.CountryCode from city as T1 join country as T2 on T1.CountryCode = T2.Code join countrylanguage as T3 on T2.Code = T3.CountryCode where T2.IndepYear = 'terminal'
328
+
329
+ select * from city
330
+ select SurfaceArea from country where SurfaceArea > (select min(SurfaceArea) from country)
331
+ select Continent from country where SurfaceArea > (select avg(Population) from country)
332
+
333
+ select max(Population) from country where Name = 'terminal'
334
+ select Continent from country where Population > (select avg(Population) from country)
335
+
336
+ select Population from country where Name = 'terminal'
337
+ select Continent from country where Population > (select avg(Population) from country)
338
+
339
+ select Region from country
340
+ select Code from country where Name = 'terminal' except select Code from country where Name = 'terminal'
341
+
342
+ select Region from country
343
+ select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
344
+
345
+ select CountryCode from city
346
+ select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
347
+
348
+ select Name from country where Region = 'terminal'
349
+ select Name from country except select Name from country
350
+ select T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Population > 'terminal'
351
+
352
+ select Continent from country where Name = 'terminal'
353
+ select T2.Name,T1.name from sqlite_sequence as T1 join country as T2 where T2.Region = 'terminal'
354
+ select District from city
355
+
356
+ select Continent from country order by Population asc limit 1
357
+ select T2.name,T3.Population,T1.Population from city as T1 join sqlite_sequence as T2 join country as T3 on T1.CountryCode = T3.Code where T3.Region = 'terminal'
358
+
359
+ select Population from country order by Population desc limit 1
360
+ select T1.Population,T2.Population from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
361
+
362
+ select Name from country
363
+ select LocalName from country group by Name having count(*) >= 'terminal'
364
+ select count(CountryCode),count(*) from countrylanguage group by CountryCode
365
+
366
+ select avg(Population),District from city group by District
367
+ select count(CountryCode),count(*) from countrylanguage group by CountryCode
368
+
369
+ select Continent from country where SurfaceArea > 'terminal' intersect select Continent from country where LifeExpectancy > 'terminal'
370
+ select count(*),Code2 from country where Population > 'terminal'
371
+
372
+ select Continent from country where Population > 'terminal'
373
+ select avg(Population) from country
374
+
375
+ select HeadOfState from country order by SurfaceArea desc limit 1
376
+ select T2.Name,T1.Name from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.SurfaceArea > 'terminal'
377
+
378
+ select Continent from country order by Population desc
379
+ select Name from country group by Name order by count(*) desc limit 1
380
+
381
+ select Continent from country order by Population desc
382
+ select T1.Name from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode order by T2.Percentage asc limit 1
383
+
384
+ select Region from country where Population = 'terminal'
385
+ select count(*) from country where Population > 'terminal'
386
+
387
+ select Region from country where Population = 'terminal'
388
+ select Name from country where Population > 'terminal'
389
+
390
+ select Continent from country where Name = 'terminal'
391
+ select T1.name,T2.Name from sqlite_sequence as T1 join country as T2 where T2.Capital > 'terminal'
392
+ select avg(T1.Population),avg(T2.Population) from city as T1 join country as T2 on T1.CountryCode = T2.Code where T2.Region = 'terminal'
393
+
394
+ select Name from city
395
+ select * from city where Population > 'terminal'
396
+
397
+ select T1.GNPOld,T2.Language from country as T1 join countrylanguage as T2 on T1.Code = T2.CountryCode
398
+ select Continent from country group by Region order by count(*) desc limit 1
399
+
400
+ select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T2.Written_by = 'terminal' and T1.series_name = 'terminal'
401
+ select Episode from TV_series where Episode = 'terminal'
402
+ select Episode from TV_series where Episode like 'terminal' and Episode = 'terminal'
403
+
404
+ select * from Cartoon where Directed_by = 'terminal'
405
+ select T3.Written_by,T3.Directed_by from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T1.Share = 'terminal'
406
+ select Episode,count(*) from TV_series group by Episode order by count(*) desc limit 1
407
+
408
+ select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T2.Title = 'terminal' and T1.series_name = 'terminal'
409
+ select Episode from TV_series group by Episode
410
+ select T3.Title,T2.Episode from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel join Cartoon as T3 on T1.id = T3.Channel where T1.Language = 'terminal'
411
+
412
+ select Package_Option from TV_Channel where Content = 'terminal'
413
+ select Episode from TV_series where Share = 'terminal'
414
+ select Title,count(*) from Cartoon where id not in (select id from Cartoon) group by id
415
+
416
+ select Title from Cartoon where Written_by = 'terminal'
417
+ select Directed_by from Cartoon where Directed_by = 'terminal'
418
+ select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
419
+
420
+ select count(*) from Cartoon where Title = 'terminal'
421
+ select T1.Content from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
422
+ select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
423
+
424
+ select Rating from TV_series where Episode = 'terminal'
425
+ select Rating from TV_series where Episode = 'terminal'
426
+ select T1.Hight_definition_TV,T2.Episode from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel order by T2.Rating desc limit 1
427
+
428
+ select Rating from TV_series where Episode = 'terminal'
429
+ select Share from TV_series where Share = 'terminal'
430
+ select max(Share),min(Share),18_49_Rating_Share from TV_series
431
+
432
+ select Episode from TV_series where Episode = 'terminal'
433
+ select T1.Rating,T3.Title from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel order by T1.Rating desc limit 1
434
+ select T2.Title from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
435
+
436
+ select T1.Content from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
437
+ select T1.Language from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
438
+ select T1.Language from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
439
+
440
+ select Title from Cartoon where Title = 'terminal'
441
+ select T2.Production_code from TV_Channel as T1 join Cartoon as T2 on T1.id = T2.Channel where T1.series_name = 'terminal'
442
+ select Rating,Episode from TV_series group by Episode
443
+
444
+ select T3.Directed_by from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T1.Episode = 'terminal'
445
+ select Directed_by from Cartoon where Title = 'terminal'
446
+ select * from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T1.series_name = 'terminal' and T2.Episode = 'terminal'
447
+
448
+ select * from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T3.Title = 'terminal' and T1.Episode = 'terminal'
449
+ select Episode from TV_series where Episode like 'terminal'
450
+
451
+ select Title from Cartoon where Title = 'terminal'
452
+ select T3.Production_code,T1.Episode from TV_series as T1 join TV_Channel as T2 on T1.Channel = T2.id join Cartoon as T3 on T2.id = T3.Channel where T3.Title = 'terminal'
453
+ select T1.series_name,T2.Episode from TV_Channel as T1 join TV_series as T2 on T1.id = T2.Channel where T2.Episode = 'terminal'
duckdb-nsql/eval/metrics/test_suite_sql_eval/exec_eval.py ADDED
@@ -0,0 +1,313 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ import duckdb
4
+ import asyncio
5
+ import threading
6
+ from typing import Tuple, Any, List, Set
7
+ from itertools import product
8
+ from collections import defaultdict
9
+ import tqdm
10
+ import random
11
+ import time
12
+ import pickle as pkl
13
+ import subprocess
14
+ from itertools import chain
15
+ import shutil
16
+ from pathlib import Path
17
+ from .parse import get_all_preds_for_execution, remove_distinct
18
+
19
+
20
+ threadLock = threading.Lock()
21
+ TIMEOUT = 60
22
+ TMP_DIR = "_tmp"
23
+ EXEC_TMP_DIR = os.path.join(os.path.dirname(__file__), "tmp")
24
+
25
+
26
+ def permute_tuple(element: Tuple, perm: Tuple) -> Tuple:
27
+ assert len(element) == len(perm)
28
+ return tuple([element[i] for i in perm])
29
+
30
+
31
+ def unorder_row(row: Tuple) -> Tuple:
32
+ return tuple(sorted(row, key=lambda x: str(x) + str(type(x))))
33
+
34
+
35
+ def tuple_sublists(row: Tuple) -> Tuple:
36
+ new_row = []
37
+ for item in row:
38
+ if isinstance(item, list):
39
+ new_row.append(tuple(item))
40
+ elif isinstance(item, dict):
41
+ new_row.append(tuple(sorted(item.items(), key=lambda x: x[0])))
42
+ print(new_row[-1])
43
+ else:
44
+ new_row.append(item)
45
+ new_row = tuple(new_row)
46
+ return new_row
47
+
48
+
49
+ # unorder each row in the table
50
+ # [result_1 and result_2 has the same bag of unordered row]
51
+ # is a necessary condition of
52
+ # [result_1 and result_2 are equivalent in denotation]
53
+ def quick_rej(result1: List[Tuple], result2: List[Tuple], order_matters: bool) -> bool:
54
+ s1 = [unorder_row(row) for row in result1]
55
+ s2 = [unorder_row(row) for row in result2]
56
+ if order_matters:
57
+ return s1 == s2
58
+ else:
59
+ return set(s1) == set(s2)
60
+
61
+
62
+ # return whether two bag of relations are equivalent
63
+ def multiset_eq(l1: List, l2: List) -> bool:
64
+ if len(l1) != len(l2):
65
+ return False
66
+ d = defaultdict(int)
67
+ for e in l1:
68
+ d[e] = d[e] + 1
69
+ for e in l2:
70
+ d[e] = d[e] - 1
71
+ if d[e] < 0:
72
+ return False
73
+ return True
74
+
75
+
76
+ def get_constraint_permutation(tab1_sets_by_columns: List[Set], result2: List[Tuple]):
77
+ num_cols = len(result2[0])
78
+ perm_constraints = [{i for i in range(num_cols)} for _ in range(num_cols)]
79
+ if num_cols <= 3:
80
+ return product(*perm_constraints)
81
+
82
+ # we sample 20 rows and constrain the space of permutations
83
+ for _ in range(20):
84
+ random_tab2_row = random.choice(result2)
85
+
86
+ for tab1_col in range(num_cols):
87
+ for tab2_col in set(perm_constraints[tab1_col]):
88
+ if random_tab2_row[tab2_col] not in tab1_sets_by_columns[tab1_col]:
89
+ perm_constraints[tab1_col].remove(tab2_col)
90
+ return product(*perm_constraints)
91
+
92
+
93
+ # check whether two denotations are correct
94
+ def result_eq(result1: List[Tuple], result2: List[Tuple], order_matters: bool) -> bool:
95
+ if len(result1) == 0 and len(result2) == 0:
96
+ return True
97
+
98
+ # if length is not the same, then they are definitely different bag of rows
99
+ if len(result1) != len(result2):
100
+ return False
101
+
102
+ num_cols = len(result1[0])
103
+
104
+ # if the results do not have the same number of columns, they are different
105
+ if len(result2[0]) != num_cols:
106
+ return False
107
+
108
+ result1 = [tuple_sublists(row) for row in result1]
109
+ result2 = [tuple_sublists(row) for row in result2]
110
+
111
+ # unorder each row and compare whether the denotation is the same
112
+ # this can already find most pair of denotations that are different
113
+ if not quick_rej(result1, result2, order_matters):
114
+ return False
115
+
116
+ # the rest of the problem is in fact more complicated than one might think
117
+ # we want to find a permutation of column order and a permutation of row order,
118
+ # s.t. result_1 is the same as result_2
119
+ # we return true if we can find such column & row permutations
120
+ # and false if we cannot
121
+ tab1_sets_by_columns = [{row[i] for row in result1} for i in range(num_cols)]
122
+
123
+ # on a high level, we enumerate all possible column permutations that might make result_1 == result_2
124
+ # we decrease the size of the column permutation space by the function get_constraint_permutation
125
+ # if one of the permutation make result_1, result_2 equivalent, then they are equivalent
126
+ for perm in get_constraint_permutation(tab1_sets_by_columns, result2):
127
+ if len(perm) != len(set(perm)):
128
+ continue
129
+ if num_cols == 1:
130
+ result2_perm = result2
131
+ else:
132
+ result2_perm = [permute_tuple(element, perm) for element in result2]
133
+ if order_matters:
134
+ if result1 == result2_perm:
135
+ return True
136
+ else:
137
+ # in fact the first condition must hold if the second condition holds
138
+ # but the first is way more efficient implementation-wise
139
+ # and we use it to quickly reject impossible candidates
140
+ if set(result1) == set(result2_perm) and multiset_eq(result1, result2_perm):
141
+ return True
142
+ return False
143
+
144
+
145
+ def replace_cur_year(query: str) -> str:
146
+ return re.sub(
147
+ "YEAR\s*\(\s*CURDATE\s*\(\s*\)\s*\)\s*", "2020", query, flags=re.IGNORECASE
148
+ )
149
+
150
+
151
+ class WithDuckDBConnectionInTmpDir(object):
152
+ def __init__(self, databases_file, tmp_dir):
153
+ if not os.path.exists(databases_file):
154
+ raise Exception("Database note found: %s" % databases_file)
155
+ os.makedirs(tmp_dir)
156
+ shutil.copy(databases_file, tmp_dir)
157
+ self.tmp_dbfile = Path(databases_file).name
158
+ self.tmp_dir = tmp_dir
159
+ self.original_wd = os.getcwd()
160
+
161
+ def __enter__(self):
162
+ os.chdir(self.tmp_dir)
163
+ self.con = duckdb.connect(self.tmp_dbfile)
164
+ return self.con
165
+
166
+ def __exit__(self, *args):
167
+ self.con.close()
168
+ os.chdir(self.original_wd)
169
+ shutil.rmtree(self.tmp_dir)
170
+
171
+
172
+ async def exec_on_db_(
173
+ duckdb_path: str, query: str, setup_sql: str, validate_sql: str
174
+ ) -> Tuple[str, Any]:
175
+ # query = replace_cur_year(query)
176
+ try:
177
+ with WithDuckDBConnectionInTmpDir(duckdb_path, TMP_DIR) as connection:
178
+ if setup_sql is not None:
179
+ print("Running Setup SQL:" + setup_sql)
180
+ connection.execute(setup_sql)
181
+ ddb_benchmark_result_rel = connection.sql(query)
182
+ if ddb_benchmark_result_rel is not None:
183
+ connection.execute(
184
+ "CREATE TABLE ddb_benchmark_result AS SELECT * FROM ddb_benchmark_result_rel"
185
+ )
186
+ else:
187
+ connection.execute("CREATE TABLE ddb_benchmark_result(empty TEXT)")
188
+ print("Running Validation SQL:" + validate_sql)
189
+ result = connection.execute(validate_sql).fetchall()
190
+ return "result", result
191
+ except Exception as e:
192
+ return "exception", e
193
+
194
+
195
+ async def exec_on_db(
196
+ duckdb_path: str,
197
+ query: str,
198
+ setup_sql: str,
199
+ validate_sql: str,
200
+ timeout: int = TIMEOUT,
201
+ ) -> Tuple[str, Any]:
202
+ try:
203
+ return await asyncio.wait_for(
204
+ exec_on_db_(duckdb_path, query, setup_sql, validate_sql), timeout
205
+ )
206
+ except asyncio.TimeoutError:
207
+ return ("exception", TimeoutError)
208
+ except Exception as e:
209
+ return ("exception", e)
210
+
211
+
212
+ # postprocess the model predictions to avoid execution errors
213
+ # e.g. removing spaces between ">" and "="
214
+ def postprocess(query: str) -> str:
215
+ query = query.replace("> =", ">=").replace("< =", "<=").replace("! =", "!=")
216
+ return query
217
+
218
+
219
+ # approximate whether p_str and g_str are semantically equivalent
220
+ # db is the database path
221
+ # we are going to evaluate whether they are equivalent in all the databases
222
+ # that are in the same directory as db
223
+ # 0 if denotationally equivalent
224
+ # 1 otherwise
225
+ # the meaning of each auxillary argument can be seen in the parser definition in evaluation.py
226
+ def eval_exec_match(
227
+ db: str,
228
+ p_str: str,
229
+ g_str: str,
230
+ setup_sql: str,
231
+ validate_sql: str,
232
+ plug_value: bool,
233
+ keep_distinct: bool,
234
+ progress_bar_for_each_datapoint: bool,
235
+ ) -> int:
236
+ # post-process the prediction.
237
+ # e.g. removing spaces between ">" and "="
238
+ p_str, g_str = postprocess(p_str), postprocess(g_str)
239
+ if not keep_distinct:
240
+ try:
241
+ # if sqlparse can't parse p_str, we should not even try to execute it
242
+ p_str = remove_distinct(p_str)
243
+ except Exception as e:
244
+ return 0
245
+ g_str = remove_distinct(g_str)
246
+
247
+ # we decide whether two denotations are equivalent based on "bag semantics"
248
+ # https://courses.cs.washington.edu/courses/cse444/10sp/lectures/lecture16.pdf
249
+ # if there is order by in query, then we assume order of the rows matter
250
+ # order by might also be used to find the max/min instead of sorting,
251
+ # but in that case the result mostly only contains one row and hence order_matters does not make a difference
252
+ order_matters = "order by" in g_str.lower()
253
+
254
+ # find all databases in the same directory
255
+ db_dir = os.path.dirname(db)
256
+ db_paths = [
257
+ os.path.join(db_dir, basename)
258
+ for basename in os.listdir(db_dir)
259
+ if ".duckdb" in basename
260
+ ]
261
+
262
+ preds = [p_str]
263
+ # if plug in value (i.e. we do not consider value prediction correctness)
264
+ # enumerate all ways to plug in values in the gold query to the model predictions
265
+ # otherwise, we only evaluate the predicted query with its own value prediction
266
+ if plug_value:
267
+ _, preds = get_all_preds_for_execution(g_str, p_str)
268
+ # we did not add this line in our EMNLP work
269
+ # this reduces "false negatives" when value is substituted
270
+ preds = chain([p_str], preds)
271
+
272
+ for pred in preds:
273
+ pred_passes = 1
274
+ # compare the gold and predicted denotations on each database in the directory
275
+ # wrap with progress bar if required
276
+ if progress_bar_for_each_datapoint:
277
+ ranger = tqdm.tqdm(db_paths)
278
+ else:
279
+ ranger = db_paths
280
+
281
+ for db_path in ranger:
282
+ g_flag, g_denotation = asyncio.run(
283
+ exec_on_db(
284
+ db_path, g_str, setup_sql=setup_sql, validate_sql=validate_sql
285
+ )
286
+ )
287
+ p_flag, p_denotation = asyncio.run(
288
+ exec_on_db(
289
+ db_path, pred, setup_sql=setup_sql, validate_sql=validate_sql
290
+ )
291
+ )
292
+
293
+ # we should expect the gold to be succesfully executed on the database
294
+ assert (
295
+ g_flag != "exception"
296
+ ), f"gold query {g_str} has error {g_denotation} on database file {db_path}"
297
+
298
+ # wrong if execution fails
299
+ if p_flag == "exception":
300
+ pred_passes = 0
301
+
302
+ # if denotations are not equivalent, the prediction must be wrong
303
+ elif not result_eq(g_denotation, p_denotation, order_matters=order_matters):
304
+ pred_passes = 0
305
+ if pred_passes == 0:
306
+ break
307
+
308
+ # the model prediction has the same denotation as the gold for all databases
309
+ if pred_passes == 1:
310
+ return 1
311
+
312
+ # none of the predictions passed
313
+ return 0
duckdb-nsql/eval/metrics/test_suite_sql_eval/parse.py ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import re
2
+ import sqlparse
3
+ from typing import List, Tuple, Set, Iterator, Dict, Any, Union
4
+ from sqlparse.sql import Comparison, Identifier
5
+ from sqlparse.tokens import Whitespace
6
+ import itertools
7
+ from collections import namedtuple
8
+
9
+ Token = namedtuple("Token", ["ttype", "value"])
10
+ VALUE_NUM_SYMBOL = "VALUERARE"
11
+ QUOTE_CHARS = {"`", "'", '"'}
12
+
13
+
14
+ def tokenize(query: str) -> List[Token]:
15
+ tokens = list([Token(t.ttype, t.value) for t in sqlparse.parse(query)[0].flatten()])
16
+ return tokens
17
+
18
+
19
+ def join_tokens(tokens: List[Token]) -> str:
20
+ return "".join([x.value for x in tokens]).strip().replace(" ", " ")
21
+
22
+
23
+ def round_trip_test(query: str) -> None:
24
+ tokens = tokenize(query)
25
+ reconstructed = "".join([token.value for token in tokens])
26
+ assert query == reconstructed, "Round trip test fails for string %s" % query
27
+
28
+
29
+ def postprocess(query: str) -> str:
30
+ query = query.replace("> =", ">=").replace("< =", "<=").replace("! =", "!=")
31
+ return query
32
+
33
+
34
+ # strip_query, reformat_query and replace values
35
+ # were implemented by Yu Tao for processing CoSQL
36
+ def strip_query(query: str) -> Tuple[List[str], List[str]]:
37
+ query_keywords, all_values = [], []
38
+
39
+ # then replace all stuff enclosed by "" with a numerical value to get it marked as {VALUE}
40
+
41
+ # Tao's implementation is commented out here.
42
+ """
43
+ str_1 = re.findall("\"[^\"]*\"", query)
44
+ str_2 = re.findall("\'[^\']*\'", query)
45
+ values = str_1 + str_2
46
+ """
47
+
48
+ toks = sqlparse.parse(query)[0].flatten()
49
+ values = [
50
+ t.value
51
+ for t in toks
52
+ if t.ttype == sqlparse.tokens.Literal.String.Single
53
+ or t.ttype == sqlparse.tokens.Literal.String.Symbol
54
+ ]
55
+
56
+ for val in values:
57
+ all_values.append(val)
58
+ query = query.replace(val.strip(), VALUE_NUM_SYMBOL)
59
+
60
+ query_tokenized = query.split()
61
+ float_nums = re.findall("[-+]?\d*\.\d+", query)
62
+ all_values += [qt for qt in query_tokenized if qt in float_nums]
63
+ query_tokenized = [
64
+ VALUE_NUM_SYMBOL if qt in float_nums else qt for qt in query_tokenized
65
+ ]
66
+
67
+ query = " ".join(query_tokenized)
68
+ int_nums = [i.strip() for i in re.findall("[^tT]\d+", query)]
69
+
70
+ all_values += [qt for qt in query_tokenized if qt in int_nums]
71
+ query_tokenized = [
72
+ VALUE_NUM_SYMBOL if qt in int_nums else qt for qt in query_tokenized
73
+ ]
74
+ # print int_nums, query, query_tokenized
75
+
76
+ for tok in query_tokenized:
77
+ if "." in tok:
78
+ table = re.findall("[Tt]\d+\.", tok)
79
+ if len(table) > 0:
80
+ to = tok.replace(".", " . ").split()
81
+ to = [t.lower() for t in to if len(t) > 0]
82
+ query_keywords.extend(to)
83
+ else:
84
+ query_keywords.append(tok.lower())
85
+
86
+ elif len(tok) > 0:
87
+ query_keywords.append(tok.lower())
88
+ return query_keywords, all_values
89
+
90
+
91
+ def reformat_query(query: str) -> str:
92
+ query = query.strip().replace(";", "").replace("\t", "")
93
+ query = " ".join(
94
+ [t.value for t in tokenize(query) if t.ttype != sqlparse.tokens.Whitespace]
95
+ )
96
+ t_stars = ["t1.*", "t2.*", "t3.*", "T1.*", "T2.*", "T3.*"]
97
+ for ts in t_stars:
98
+ query = query.replace(ts, "*")
99
+ return query
100
+
101
+
102
+ def replace_values(sql: str) -> Tuple[List[str], Set[str]]:
103
+ sql = sqlparse.format(sql, reindent=False, keyword_case="upper")
104
+ # sql = re.sub(r"(<=|>=|!=|=|<|>|,)", r" \1 ", sql)
105
+ sql = re.sub(r"(T\d+\.)\s", r"\1", sql)
106
+ query_toks_no_value, values = strip_query(sql)
107
+ return query_toks_no_value, set(values)
108
+
109
+
110
+ # extract the non-value tokens and the set of values
111
+ # from a sql query
112
+ def extract_query_values(sql: str) -> Tuple[List[str], Set[str]]:
113
+ reformated = reformat_query(query=sql)
114
+ query_value_replaced, values = replace_values(reformated)
115
+ return query_value_replaced, values
116
+
117
+
118
+ # plug in the values into query with value slots
119
+ def plugin(query_value_replaced: List[str], values_in_order: List[str]) -> str:
120
+ q_length = len(query_value_replaced)
121
+ query_w_values = query_value_replaced[:]
122
+ value_idx = [
123
+ idx
124
+ for idx in range(q_length)
125
+ if query_value_replaced[idx] == VALUE_NUM_SYMBOL.lower()
126
+ ]
127
+ assert len(value_idx) == len(values_in_order)
128
+
129
+ for idx, value in zip(value_idx, values_in_order):
130
+ query_w_values[idx] = value
131
+ return " ".join(query_w_values)
132
+
133
+
134
+ # a generator generating all possible ways of
135
+ # filling values into predicted query
136
+ def plugin_all_permutations(
137
+ query_value_replaced: List[str], values: Set[str]
138
+ ) -> Iterator[str]:
139
+ num_slots = len([v for v in query_value_replaced if v == VALUE_NUM_SYMBOL.lower()])
140
+ for values in itertools.product(*[list(values) for _ in range(num_slots)]):
141
+ yield plugin(query_value_replaced, list(values))
142
+
143
+
144
+ # given the gold query and the model prediction
145
+ # extract values from the gold, extract predicted sql with value slots
146
+ # return 1) number of possible ways to plug in gold values and 2) an iterator of predictions with value plugged in
147
+ def get_all_preds_for_execution(gold: str, pred: str) -> Tuple[int, Iterator[str]]:
148
+ _, gold_values = extract_query_values(gold)
149
+ pred_query_value_replaced, _ = extract_query_values(pred)
150
+ num_slots = len(
151
+ [v for v in pred_query_value_replaced if v == VALUE_NUM_SYMBOL.lower()]
152
+ )
153
+ num_alternatives = len(gold_values) ** num_slots
154
+ return (
155
+ num_alternatives,
156
+ plugin_all_permutations(pred_query_value_replaced, gold_values),
157
+ )
158
+
159
+
160
+ def remove_distinct(s):
161
+ toks = [t.value for t in list(sqlparse.parse(s)[0].flatten())]
162
+ return "".join([t for t in toks if t.lower() != "distinct"])
163
+
164
+
165
+ def extract_all_comparison_from_node(node: Token) -> List[Comparison]:
166
+ comparison_list = []
167
+ if hasattr(node, "tokens"):
168
+ for t in node.tokens:
169
+ comparison_list.extend(extract_all_comparison_from_node(t))
170
+ if type(node) == Comparison:
171
+ comparison_list.append(node)
172
+ return comparison_list
173
+
174
+
175
+ def extract_all_comparison(query: str) -> List[Comparison]:
176
+ tree = sqlparse.parse(query)[0]
177
+ comparison_list = extract_all_comparison_from_node(tree)
178
+ return comparison_list
179
+
180
+
181
+ def extract_toks_from_comparison(comparison_node: Comparison) -> List[Token]:
182
+ tokens = [t for t in comparison_node.tokens if t.ttype != Whitespace]
183
+ return tokens
184
+
185
+
186
+ def extract_info_from_comparison(comparison_node: Comparison) -> Dict[str, Any]:
187
+ tokens = extract_toks_from_comparison(comparison_node)
188
+ left, op, right = tokens
189
+
190
+ returned_dict = {"left": left, "op": op.value, "right": right}
191
+
192
+ if type(left) != Identifier:
193
+ return returned_dict
194
+
195
+ table = None
196
+ if len(left.tokens) == 3 and re.match("^[tT][0-9]$", left.tokens[0].value) is None:
197
+ table = left.tokens[0].value.lower()
198
+ col = left.tokens[-1].value
199
+
200
+ if type(right) == Identifier:
201
+ if len(right.tokens) == 1 and type(right.tokens[0]) == sqlparse.sql.Token:
202
+ right_val = right.tokens[0].value
203
+ else:
204
+ return returned_dict
205
+ elif type(right) == sqlparse.sql.Token:
206
+ right_val = right.value
207
+ else:
208
+ return returned_dict
209
+
210
+ returned_dict["table_col"], returned_dict["val"] = (
211
+ (table, col.upper()),
212
+ process_str_value(right_val),
213
+ )
214
+
215
+ return returned_dict
216
+
217
+
218
+ def extract_all_comparison_from_query(query: str) -> List[Dict[str, Any]]:
219
+ comparison_list = extract_all_comparison(query)
220
+ return [extract_info_from_comparison(c) for c in comparison_list]
221
+
222
+
223
+ def extract_typed_value_in_comparison_from_query(
224
+ query: str,
225
+ ) -> List[Tuple[Tuple[Union[str, None], str], str]]:
226
+ cmps = extract_all_comparison_from_query(query)
227
+ typed_values = [
228
+ (cmp["table_col"], cmp["val"]) for cmp in cmps if "table_col" in cmp
229
+ ]
230
+ for table, col, val1, val2 in re.findall(
231
+ "(?:([^\.\s]*)\.)?([^\.\s]+) between ([^\s;]+) and ([^\s;]+)",
232
+ query,
233
+ re.IGNORECASE,
234
+ ):
235
+ if table == "":
236
+ table = None
237
+ else:
238
+ table = table.lower()
239
+ col = col.upper()
240
+ for v in [val1, val2]:
241
+ typed_values.append(((table, col), v))
242
+ return typed_values
243
+
244
+
245
+ def process_str_value(v: str) -> str:
246
+ if len(v) > 0 and v[0] in QUOTE_CHARS:
247
+ v = v[1:]
248
+ if len(v) > 0 and v[-1] in QUOTE_CHARS:
249
+ v = v[:-1]
250
+ for c in QUOTE_CHARS:
251
+ v = v.replace(c + c, c)
252
+ return v
duckdb-nsql/eval/metrics/test_suite_sql_eval/process_sql.py ADDED
@@ -0,0 +1,644 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ################################
2
+ # Assumptions:
3
+ # 1. sql is correct
4
+ # 2. only table name has alias
5
+ # 3. only one intersect/union/except
6
+ #
7
+ # val: number(float)/string(str)/sql(dict)
8
+ # col_unit: (agg_id, col_id, isDistinct(bool))
9
+ # val_unit: (unit_op, col_unit1, col_unit2)
10
+ # table_unit: (table_type, col_unit/sql)
11
+ # cond_unit: (not_op, op_id, val_unit, val1, val2)
12
+ # condition: [cond_unit1, 'and'/'or', cond_unit2, ...]
13
+ # sql {
14
+ # 'select': (isDistinct(bool), [(agg_id, val_unit), (agg_id, val_unit), ...])
15
+ # 'from': {'table_units': [table_unit1, table_unit2, ...], 'conds': condition}
16
+ # 'where': condition
17
+ # 'groupBy': [col_unit1, col_unit2, ...]
18
+ # 'orderBy': ('asc'/'desc', [val_unit1, val_unit2, ...])
19
+ # 'having': condition
20
+ # 'limit': None/limit value
21
+ # 'intersect': None/sql
22
+ # 'except': None/sql
23
+ # 'union': None/sql
24
+ # }
25
+ ################################
26
+
27
+ import json
28
+ import duckdb
29
+ from nltk import word_tokenize
30
+
31
+ CLAUSE_KEYWORDS = (
32
+ "select",
33
+ "from",
34
+ "where",
35
+ "group",
36
+ "order",
37
+ "limit",
38
+ "intersect",
39
+ "union",
40
+ "except",
41
+ )
42
+ JOIN_KEYWORDS = ("join", "on", "as")
43
+
44
+ WHERE_OPS = (
45
+ "not",
46
+ "between",
47
+ "=",
48
+ ">",
49
+ "<",
50
+ ">=",
51
+ "<=",
52
+ "!=",
53
+ "in",
54
+ "like",
55
+ "is",
56
+ "exists",
57
+ )
58
+ UNIT_OPS = ("none", "-", "+", "*", "/")
59
+ AGG_OPS = ("none", "max", "min", "count", "sum", "avg")
60
+ TABLE_TYPE = {
61
+ "sql": "sql",
62
+ "table_unit": "table_unit",
63
+ }
64
+
65
+ COND_OPS = ("and", "or")
66
+ SQL_OPS = ("intersect", "union", "except")
67
+ ORDER_OPS = ("desc", "asc")
68
+
69
+
70
+ class Schema:
71
+ """
72
+ Simple schema which maps table&column to a unique identifier
73
+ """
74
+
75
+ def __init__(self, schema):
76
+ self._schema = schema
77
+ self._idMap = self._map(self._schema)
78
+
79
+ @property
80
+ def schema(self):
81
+ return self._schema
82
+
83
+ @property
84
+ def idMap(self):
85
+ return self._idMap
86
+
87
+ def _map(self, schema):
88
+ idMap = {"*": "__all__"}
89
+ id = 1
90
+ for key, vals in schema.items():
91
+ for val in vals:
92
+ idMap[key.lower() + "." + val.lower()] = (
93
+ "__" + key.lower() + "." + val.lower() + "__"
94
+ )
95
+ id += 1
96
+
97
+ for key in schema:
98
+ idMap[key.lower()] = "__" + key.lower() + "__"
99
+ id += 1
100
+
101
+ return idMap
102
+
103
+
104
+ def get_schema(db):
105
+ """
106
+ Get database's schema, which is a dict with table name as key
107
+ and list of column names as value
108
+ :param db: database path
109
+ :return: schema dict
110
+ """
111
+
112
+ schema = {}
113
+ conn = duckdb.connect(db)
114
+
115
+
116
+
117
+ # fetch table names
118
+ res = conn.execute("show tables").fetchall()
119
+ tables = [r[0] for r in res]
120
+
121
+ # fetch table info
122
+ for table in tables:
123
+ res = conn.execute("PRAGMA table_info({})".format(table))
124
+ schema[table] = [str(col[1].lower()) for col in res.fetchall()]
125
+
126
+ return schema
127
+
128
+
129
+ def get_schema_from_json(fpath):
130
+ with open(fpath) as f:
131
+ data = json.load(f)
132
+
133
+ schema = {}
134
+ for entry in data:
135
+ table = str(entry["table"].lower())
136
+ cols = [str(col["column_name"].lower()) for col in entry["col_data"]]
137
+ schema[table] = cols
138
+
139
+ return schema
140
+
141
+
142
+ def tokenize(string):
143
+ string = str(string)
144
+ string = string.replace(
145
+ "'", '"'
146
+ ) # ensures all string values wrapped by "" problem??
147
+ quote_idxs = [idx for idx, char in enumerate(string) if char == '"']
148
+ assert len(quote_idxs) % 2 == 0, "Unexpected quote"
149
+
150
+ # keep string value as token
151
+ vals = {}
152
+ for i in range(len(quote_idxs) - 1, -1, -2):
153
+ qidx1 = quote_idxs[i - 1]
154
+ qidx2 = quote_idxs[i]
155
+ val = string[qidx1 : qidx2 + 1]
156
+ key = "__val_{}_{}__".format(qidx1, qidx2)
157
+ string = string[:qidx1] + key + string[qidx2 + 1 :]
158
+ vals[key] = val
159
+
160
+ toks = [word.lower() for word in word_tokenize(string)]
161
+ # replace with string value token
162
+ for i in range(len(toks)):
163
+ if toks[i] in vals:
164
+ toks[i] = vals[toks[i]]
165
+
166
+ # find if there exists !=, >=, <=
167
+ eq_idxs = [idx for idx, tok in enumerate(toks) if tok == "="]
168
+ eq_idxs.reverse()
169
+ prefix = ("!", ">", "<")
170
+ for eq_idx in eq_idxs:
171
+ pre_tok = toks[eq_idx - 1]
172
+ if pre_tok in prefix:
173
+ toks = toks[: eq_idx - 1] + [pre_tok + "="] + toks[eq_idx + 1 :]
174
+
175
+ return toks
176
+
177
+
178
+ def scan_alias(toks):
179
+ """Scan the index of 'as' and build the map for all alias"""
180
+ as_idxs = [idx for idx, tok in enumerate(toks) if tok == "as"]
181
+ alias = {}
182
+ for idx in as_idxs:
183
+ alias[toks[idx + 1]] = toks[idx - 1]
184
+ return alias
185
+
186
+
187
+ def get_tables_with_alias(schema, toks):
188
+ tables = scan_alias(toks)
189
+ for key in schema:
190
+ assert key not in tables, "Alias {} has the same name in table".format(key)
191
+ tables[key] = key
192
+ return tables
193
+
194
+
195
+ def parse_col(toks, start_idx, tables_with_alias, schema, default_tables=None):
196
+ """
197
+ :returns next idx, column id
198
+ """
199
+ tok = toks[start_idx]
200
+ if tok == "*":
201
+ return start_idx + 1, schema.idMap[tok]
202
+
203
+ if "." in tok: # if token is a composite
204
+ alias, col = tok.split(".")
205
+ key = tables_with_alias[alias] + "." + col
206
+ return start_idx + 1, schema.idMap[key]
207
+
208
+ assert (
209
+ default_tables is not None and len(default_tables) > 0
210
+ ), "Default tables should not be None or empty"
211
+
212
+ for alias in default_tables:
213
+ table = tables_with_alias[alias]
214
+ if tok in schema.schema[table]:
215
+ key = table + "." + tok
216
+ return start_idx + 1, schema.idMap[key]
217
+
218
+ assert False, "Error col: {}".format(tok)
219
+
220
+
221
+ def parse_col_unit(toks, start_idx, tables_with_alias, schema, default_tables=None):
222
+ """
223
+ :returns next idx, (agg_op id, col_id)
224
+ """
225
+ idx = start_idx
226
+ len_ = len(toks)
227
+ isBlock = False
228
+ isDistinct = False
229
+ if toks[idx] == "(":
230
+ isBlock = True
231
+ idx += 1
232
+
233
+ if toks[idx] in AGG_OPS:
234
+ agg_id = AGG_OPS.index(toks[idx])
235
+ idx += 1
236
+ assert idx < len_ and toks[idx] == "("
237
+ idx += 1
238
+ if toks[idx] == "distinct":
239
+ idx += 1
240
+ isDistinct = True
241
+ idx, col_id = parse_col(toks, idx, tables_with_alias, schema, default_tables)
242
+ assert idx < len_ and toks[idx] == ")"
243
+ idx += 1
244
+ return idx, (agg_id, col_id, isDistinct)
245
+
246
+ if toks[idx] == "distinct":
247
+ idx += 1
248
+ isDistinct = True
249
+ agg_id = AGG_OPS.index("none")
250
+ idx, col_id = parse_col(toks, idx, tables_with_alias, schema, default_tables)
251
+
252
+ if isBlock:
253
+ assert toks[idx] == ")"
254
+ idx += 1 # skip ')'
255
+
256
+ return idx, (agg_id, col_id, isDistinct)
257
+
258
+
259
+ def parse_val_unit(toks, start_idx, tables_with_alias, schema, default_tables=None):
260
+ idx = start_idx
261
+ len_ = len(toks)
262
+ isBlock = False
263
+ if toks[idx] == "(":
264
+ isBlock = True
265
+ idx += 1
266
+
267
+ col_unit1 = None
268
+ col_unit2 = None
269
+ unit_op = UNIT_OPS.index("none")
270
+
271
+ idx, col_unit1 = parse_col_unit(
272
+ toks, idx, tables_with_alias, schema, default_tables
273
+ )
274
+ if idx < len_ and toks[idx] in UNIT_OPS:
275
+ unit_op = UNIT_OPS.index(toks[idx])
276
+ idx += 1
277
+ idx, col_unit2 = parse_col_unit(
278
+ toks, idx, tables_with_alias, schema, default_tables
279
+ )
280
+
281
+ if isBlock:
282
+ assert toks[idx] == ")"
283
+ idx += 1 # skip ')'
284
+
285
+ return idx, (unit_op, col_unit1, col_unit2)
286
+
287
+
288
+ def parse_table_unit(toks, start_idx, tables_with_alias, schema):
289
+ """
290
+ :returns next idx, table id, table name
291
+ """
292
+ idx = start_idx
293
+ len_ = len(toks)
294
+ key = tables_with_alias[toks[idx]]
295
+
296
+ if idx + 1 < len_ and toks[idx + 1] == "as":
297
+ idx += 3
298
+ else:
299
+ idx += 1
300
+
301
+ return idx, schema.idMap[key], key
302
+
303
+
304
+ def parse_value(toks, start_idx, tables_with_alias, schema, default_tables=None):
305
+ idx = start_idx
306
+ len_ = len(toks)
307
+
308
+ isBlock = False
309
+ if toks[idx] == "(":
310
+ isBlock = True
311
+ idx += 1
312
+
313
+ if toks[idx] == "select":
314
+ idx, val = parse_sql(toks, idx, tables_with_alias, schema)
315
+ elif '"' in toks[idx]: # token is a string value
316
+ val = toks[idx]
317
+ idx += 1
318
+ else:
319
+ try:
320
+ val = float(toks[idx])
321
+ idx += 1
322
+ except:
323
+ end_idx = idx
324
+ while (
325
+ end_idx < len_
326
+ and toks[end_idx] != ","
327
+ and toks[end_idx] != ")"
328
+ and toks[end_idx] != "and"
329
+ and toks[end_idx] not in CLAUSE_KEYWORDS
330
+ and toks[end_idx] not in JOIN_KEYWORDS
331
+ ):
332
+ end_idx += 1
333
+
334
+ idx, val = parse_col_unit(
335
+ toks[start_idx:end_idx], 0, tables_with_alias, schema, default_tables
336
+ )
337
+ idx = end_idx
338
+
339
+ if isBlock:
340
+ assert toks[idx] == ")"
341
+ idx += 1
342
+
343
+ return idx, val
344
+
345
+
346
+ def parse_condition(toks, start_idx, tables_with_alias, schema, default_tables=None):
347
+ idx = start_idx
348
+ len_ = len(toks)
349
+ conds = []
350
+
351
+ while idx < len_:
352
+ idx, val_unit = parse_val_unit(
353
+ toks, idx, tables_with_alias, schema, default_tables
354
+ )
355
+ not_op = False
356
+ if toks[idx] == "not":
357
+ not_op = True
358
+ idx += 1
359
+
360
+ assert (
361
+ idx < len_ and toks[idx] in WHERE_OPS
362
+ ), "Error condition: idx: {}, tok: {}".format(idx, toks[idx])
363
+ op_id = WHERE_OPS.index(toks[idx])
364
+ idx += 1
365
+ val1 = val2 = None
366
+ if op_id == WHERE_OPS.index(
367
+ "between"
368
+ ): # between..and... special case: dual values
369
+ idx, val1 = parse_value(
370
+ toks, idx, tables_with_alias, schema, default_tables
371
+ )
372
+ assert toks[idx] == "and"
373
+ idx += 1
374
+ idx, val2 = parse_value(
375
+ toks, idx, tables_with_alias, schema, default_tables
376
+ )
377
+ else: # normal case: single value
378
+ idx, val1 = parse_value(
379
+ toks, idx, tables_with_alias, schema, default_tables
380
+ )
381
+ val2 = None
382
+
383
+ conds.append((not_op, op_id, val_unit, val1, val2))
384
+
385
+ if idx < len_ and (
386
+ toks[idx] in CLAUSE_KEYWORDS
387
+ or toks[idx] in (")", ";")
388
+ or toks[idx] in JOIN_KEYWORDS
389
+ ):
390
+ break
391
+
392
+ if idx < len_ and toks[idx] in COND_OPS:
393
+ conds.append(toks[idx])
394
+ idx += 1 # skip and/or
395
+
396
+ return idx, conds
397
+
398
+
399
+ def parse_select(toks, start_idx, tables_with_alias, schema, default_tables=None):
400
+ idx = start_idx
401
+ len_ = len(toks)
402
+
403
+ assert toks[idx] == "select", "'select' not found"
404
+ idx += 1
405
+ isDistinct = False
406
+ if idx < len_ and toks[idx] == "distinct":
407
+ idx += 1
408
+ isDistinct = True
409
+ val_units = []
410
+
411
+ while idx < len_ and toks[idx] not in CLAUSE_KEYWORDS:
412
+ agg_id = AGG_OPS.index("none")
413
+ if toks[idx] in AGG_OPS:
414
+ agg_id = AGG_OPS.index(toks[idx])
415
+ idx += 1
416
+ idx, val_unit = parse_val_unit(
417
+ toks, idx, tables_with_alias, schema, default_tables
418
+ )
419
+ val_units.append((agg_id, val_unit))
420
+ if idx < len_ and toks[idx] == ",":
421
+ idx += 1 # skip ','
422
+
423
+ return idx, (isDistinct, val_units)
424
+
425
+
426
+ def parse_from(toks, start_idx, tables_with_alias, schema):
427
+ """
428
+ Assume in the from clause, all table units are combined with join
429
+ """
430
+ assert "from" in toks[start_idx:], "'from' not found"
431
+
432
+ len_ = len(toks)
433
+ idx = toks.index("from", start_idx) + 1
434
+ default_tables = []
435
+ table_units = []
436
+ conds = []
437
+
438
+ while idx < len_:
439
+ isBlock = False
440
+ if toks[idx] == "(":
441
+ isBlock = True
442
+ idx += 1
443
+
444
+ if toks[idx] == "select":
445
+ idx, sql = parse_sql(toks, idx, tables_with_alias, schema)
446
+ table_units.append((TABLE_TYPE["sql"], sql))
447
+ else:
448
+ if idx < len_ and toks[idx] == "join":
449
+ idx += 1 # skip join
450
+ idx, table_unit, table_name = parse_table_unit(
451
+ toks, idx, tables_with_alias, schema
452
+ )
453
+ table_units.append((TABLE_TYPE["table_unit"], table_unit))
454
+ default_tables.append(table_name)
455
+ if idx < len_ and toks[idx] == "on":
456
+ idx += 1 # skip on
457
+ idx, this_conds = parse_condition(
458
+ toks, idx, tables_with_alias, schema, default_tables
459
+ )
460
+ if len(conds) > 0:
461
+ conds.append("and")
462
+ conds.extend(this_conds)
463
+
464
+ if isBlock:
465
+ assert toks[idx] == ")"
466
+ idx += 1
467
+ if idx < len_ and (toks[idx] in CLAUSE_KEYWORDS or toks[idx] in (")", ";")):
468
+ break
469
+
470
+ return idx, table_units, conds, default_tables
471
+
472
+
473
+ def parse_where(toks, start_idx, tables_with_alias, schema, default_tables):
474
+ idx = start_idx
475
+ len_ = len(toks)
476
+
477
+ if idx >= len_ or toks[idx] != "where":
478
+ return idx, []
479
+
480
+ idx += 1
481
+ idx, conds = parse_condition(toks, idx, tables_with_alias, schema, default_tables)
482
+ return idx, conds
483
+
484
+
485
+ def parse_group_by(toks, start_idx, tables_with_alias, schema, default_tables):
486
+ idx = start_idx
487
+ len_ = len(toks)
488
+ col_units = []
489
+
490
+ if idx >= len_ or toks[idx] != "group":
491
+ return idx, col_units
492
+
493
+ idx += 1
494
+ assert toks[idx] == "by"
495
+ idx += 1
496
+
497
+ while idx < len_ and not (toks[idx] in CLAUSE_KEYWORDS or toks[idx] in (")", ";")):
498
+ idx, col_unit = parse_col_unit(
499
+ toks, idx, tables_with_alias, schema, default_tables
500
+ )
501
+ col_units.append(col_unit)
502
+ if idx < len_ and toks[idx] == ",":
503
+ idx += 1 # skip ','
504
+ else:
505
+ break
506
+
507
+ return idx, col_units
508
+
509
+
510
+ def parse_order_by(toks, start_idx, tables_with_alias, schema, default_tables):
511
+ idx = start_idx
512
+ len_ = len(toks)
513
+ val_units = []
514
+ order_type = "asc" # default type is 'asc'
515
+
516
+ if idx >= len_ or toks[idx] != "order":
517
+ return idx, val_units
518
+
519
+ idx += 1
520
+ assert toks[idx] == "by"
521
+ idx += 1
522
+
523
+ while idx < len_ and not (toks[idx] in CLAUSE_KEYWORDS or toks[idx] in (")", ";")):
524
+ idx, val_unit = parse_val_unit(
525
+ toks, idx, tables_with_alias, schema, default_tables
526
+ )
527
+ val_units.append(val_unit)
528
+ if idx < len_ and toks[idx] in ORDER_OPS:
529
+ order_type = toks[idx]
530
+ idx += 1
531
+ if idx < len_ and toks[idx] == ",":
532
+ idx += 1 # skip ','
533
+ else:
534
+ break
535
+
536
+ return idx, (order_type, val_units)
537
+
538
+
539
+ def parse_having(toks, start_idx, tables_with_alias, schema, default_tables):
540
+ idx = start_idx
541
+ len_ = len(toks)
542
+
543
+ if idx >= len_ or toks[idx] != "having":
544
+ return idx, []
545
+
546
+ idx += 1
547
+ idx, conds = parse_condition(toks, idx, tables_with_alias, schema, default_tables)
548
+ return idx, conds
549
+
550
+
551
+ def parse_limit(toks, start_idx):
552
+ idx = start_idx
553
+ len_ = len(toks)
554
+
555
+ if idx < len_ and toks[idx] == "limit":
556
+ idx += 2
557
+ # make limit value can work, cannot assume put 1 as a fake limit number
558
+ if type(toks[idx - 1]) != int:
559
+ return idx, 1
560
+
561
+ return idx, int(toks[idx - 1])
562
+
563
+ return idx, None
564
+
565
+
566
+ def parse_sql(toks, start_idx, tables_with_alias, schema):
567
+ isBlock = False # indicate whether this is a block of sql/sub-sql
568
+ len_ = len(toks)
569
+ idx = start_idx
570
+
571
+ sql = {}
572
+ if toks[idx] == "(":
573
+ isBlock = True
574
+ idx += 1
575
+
576
+ # parse from clause in order to get default tables
577
+ from_end_idx, table_units, conds, default_tables = parse_from(
578
+ toks, start_idx, tables_with_alias, schema
579
+ )
580
+ sql["from"] = {"table_units": table_units, "conds": conds}
581
+ # select clause
582
+ _, select_col_units = parse_select(
583
+ toks, idx, tables_with_alias, schema, default_tables
584
+ )
585
+ idx = from_end_idx
586
+ sql["select"] = select_col_units
587
+ # where clause
588
+ idx, where_conds = parse_where(toks, idx, tables_with_alias, schema, default_tables)
589
+ sql["where"] = where_conds
590
+ # group by clause
591
+ idx, group_col_units = parse_group_by(
592
+ toks, idx, tables_with_alias, schema, default_tables
593
+ )
594
+ sql["groupBy"] = group_col_units
595
+ # having clause
596
+ idx, having_conds = parse_having(
597
+ toks, idx, tables_with_alias, schema, default_tables
598
+ )
599
+ sql["having"] = having_conds
600
+ # order by clause
601
+ idx, order_col_units = parse_order_by(
602
+ toks, idx, tables_with_alias, schema, default_tables
603
+ )
604
+ sql["orderBy"] = order_col_units
605
+ # limit clause
606
+ idx, limit_val = parse_limit(toks, idx)
607
+ sql["limit"] = limit_val
608
+
609
+ idx = skip_semicolon(toks, idx)
610
+ if isBlock:
611
+ assert toks[idx] == ")"
612
+ idx += 1 # skip ')'
613
+ idx = skip_semicolon(toks, idx)
614
+
615
+ # intersect/union/except clause
616
+ for op in SQL_OPS: # initialize IUE
617
+ sql[op] = None
618
+ if idx < len_ and toks[idx] in SQL_OPS:
619
+ sql_op = toks[idx]
620
+ idx += 1
621
+ idx, IUE_sql = parse_sql(toks, idx, tables_with_alias, schema)
622
+ sql[sql_op] = IUE_sql
623
+ return idx, sql
624
+
625
+
626
+ def load_data(fpath):
627
+ with open(fpath) as f:
628
+ data = json.load(f)
629
+ return data
630
+
631
+
632
+ def get_sql(schema, query):
633
+ toks = tokenize(query)
634
+ tables_with_alias = get_tables_with_alias(schema.schema, toks)
635
+ _, sql = parse_sql(toks, 0, tables_with_alias, schema)
636
+
637
+ return sql
638
+
639
+
640
+ def skip_semicolon(toks, start_idx):
641
+ idx = start_idx
642
+ while idx < len(toks) and toks[idx] == ";":
643
+ idx += 1
644
+ return idx
duckdb-nsql/eval/metrics/test_suite_sql_eval/tables.json ADDED
The diff for this file is too large to render. See raw diff
 
duckdb-nsql/eval/metrics/test_suite_sql_eval/tmp/readme.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ This folder contains tmp files that are used in executing SQLs on the database.