pavankumarbalijepalli commited on
Commit
369dc3b
β€’
1 Parent(s): 9153c89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -13
README.md CHANGED
@@ -15,9 +15,11 @@ tags:
15
  widget:
16
  - text: "### Task\nGenerate a SQL query to answer the following question:\n`How many heads of the departments are older than 56?`\n\n### Database Schema\nThe query will run on a database with the following schema:\nCREATE TABLE head (age INTEGER)\n\n### Answer\nGiven the database schema, here is the SQL query that answers `How many heads of the departments are older than 56?`:\n```sql"
17
  example_title: "One Table"
 
 
18
  ---
19
 
20
- # Update: 14-03-2024 - The model card is still updating. Thanks for being patient! πŸ’œπŸ’œ
21
 
22
  # Model Card for Model ID
23
 
@@ -175,26 +177,50 @@ Users (both direct and downstream) should be made aware of the risks, biases and
175
 
176
  <!-- This should link to a Dataset Card if possible. -->
177
 
178
- [More Information Needed]
179
 
180
  #### Factors
181
 
182
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 
183
 
184
- [More Information Needed]
185
 
186
  #### Metrics
187
 
188
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
189
-
190
- [More Information Needed]
191
-
192
  ### Results
193
 
194
- [More Information Needed]
 
195
 
196
  #### Summary
197
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
 
199
  ## Environmental Impact
200
 
@@ -202,11 +228,11 @@ Users (both direct and downstream) should be made aware of the risks, biases and
202
 
203
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
204
 
205
- - **Hardware Type:** [More Information Needed]
206
- - **Hours used:** [More Information Needed]
207
- - **Cloud Provider:** [More Information Needed]
208
- - **Compute Region:** [More Information Needed]
209
- - **Carbon Emitted:** [More Information Needed]
210
 
211
 
212
  ## Citation [optional]
 
15
  widget:
16
  - text: "### Task\nGenerate a SQL query to answer the following question:\n`How many heads of the departments are older than 56?`\n\n### Database Schema\nThe query will run on a database with the following schema:\nCREATE TABLE head (age INTEGER)\n\n### Answer\nGiven the database schema, here is the SQL query that answers `How many heads of the departments are older than 56?`:\n```sql"
17
  example_title: "One Table"
18
+ - text: "### Task\nGenerate a SQL query to answer the following question:\n`Show the name and number of employees for the departments managed by heads whose temporary acting value is 'Yes'?`\n\n### Database Schema\nThe query will run on a database with the following schema:\nCREATE TABLE management (department_id VARCHAR, temporary_acting VARCHAR); CREATE TABLE department (name VARCHAR, num_employees VARCHAR, department_id VARCHAR)\n\n### Answer\nGiven the database schema, here is the SQL query that answers `Show the name and number of employees for the departments managed by heads whose temporary acting value is 'Yes'?`:\n```sql"
19
+ example_title: "Two Tables"
20
  ---
21
 
22
+ # Thanks for being patient! πŸ’œπŸ’œ
23
 
24
  # Model Card for Model ID
25
 
 
177
 
178
  <!-- This should link to a Dataset Card if possible. -->
179
 
180
+ Used b-mc2/sql-create-context and split the data into training and testing datasets. The holdout dataset is used for testing the model.
181
 
182
  #### Factors
183
 
184
  <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
185
+ The complexity of the questions are calculated using the number of tables per question, number of joins, group by, and sub queries per answer. This complexity is used to prepare the test data by stratifying the split around the complexity.
186
 
 
187
 
188
  #### Metrics
189
 
190
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
191
+ * __Execution Success:__ This metric is used to find out if the generated query is executable without arising any errors. For this, a sqllite3 connection is made to the memory, and using context the dummy tables are created. Then the predicted SQL is executed. This checks out if the generated query is in proper syntax, and if the model is hallucinating any new columns.
192
+ * __Inference Time:__ This metric is used to find out which model is providing results in less amount of time. This combined with the execution success, gives the efficiency of the model.
193
+ -
194
  ### Results
195
 
196
+ * __Execution Success:__ Finetuned Phi-2 has 29% more success rate than the SQLCoder-7b-2
197
+ * __Inference Time:__ Finetuned Phi-2 has 41% increased inference speed than SQLCoder-7b-2
198
 
199
  #### Summary
200
+ * __Reduced Inference Time and Memory Footprint:__ The fine-tuned Phi-2 model
201
+ demonstrated a reduction in inference time and memory usage compared to the DeFog
202
+ SQLCoder. This is attributed to Phi-2's smaller size and the efficiency of quantization
203
+ techniques employed during fine-tuning. This finding implies that NL2SQL models can
204
+ be deployed on lower-powered devices like laptops or even mobile phones, potentially
205
+ democratizing access to this technology for a wider range of users.
206
+
207
+ * __Competitive Performance on Easy and Medium Queries:__ The fine-tuned Phi-2
208
+ achieved comparable performance to the DeFog SQLCoder in terms of accuracy on easy,
209
+ medium, and hard difficulty queries. This indicates that Phi-2, despite its smaller size,
210
+ can effectively handle a significant portion of real-world NL2SQL tasks, especially for
211
+ simpler queries.
212
+
213
+ * __Challenges with Complex Queries:__ While Phi-2 performed well on easier queries, it
214
+ encountered challenges with complex queries, exhibiting a drop in execution success
215
+ compared to the DeFog SQLCoder. This highlights the trade-off between model size and
216
+ complexity, suggesting that larger models might still be necessary for tackling highly
217
+ intricate tasks.
218
+
219
+ * __Potential for Further Improvement:__ The fine-tuning process employed in this study
220
+ can be further optimized by exploring different hyperparameter configurations and
221
+ potentially investigating alternative fine-tuning techniques like adapter-based methods.
222
+ This optimization has the potential to improve the model's performance on complex
223
+ queries while maintaining its efficiency.
224
 
225
  ## Environmental Impact
226
 
 
228
 
229
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
230
 
231
+ - **Hardware Type:** A100 PCIE 40GB X1
232
+ - **Hours used:** 18 Hours
233
+ - **Cloud Provider:** Google Cloud
234
+ - **Compute Region:** Asia-East-1
235
+ - **Carbon Emitted:** 2.52 kg eq. CO2
236
 
237
 
238
  ## Citation [optional]