File size: 2,646 Bytes
283a3d5
 
 
64aa63c
 
 
 
 
 
 
 
 
 
283a3d5
 
5b0ffc8
283a3d5
5b0ffc8
283a3d5
64aa63c
283a3d5
64aa63c
 
 
5b0ffc8
64aa63c
0364e70
5b0ffc8
0364e70
64aa63c
 
 
283a3d5
64aa63c
 
 
 
 
283a3d5
64aa63c
 
 
 
 
 
 
 
283a3d5
64aa63c
 
283a3d5
64aa63c
 
 
 
283a3d5
64aa63c
 
 
 
283a3d5
64aa63c
 
283a3d5
64aa63c
283a3d5
64aa63c
 
 
283a3d5
64aa63c
283a3d5
64aa63c
283a3d5
64aa63c
3454c35
283a3d5
3454c35
283a3d5
64aa63c
283a3d5
64aa63c
283a3d5
64aa63c
ebb9f75
283a3d5
 
 
 
64aa63c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
library_name: peft
base_model: declare-lab/flan-alpaca-base
datasets:
- knowrohit07/know_sql
license: mit
language:
- en
pipeline_tag: text2text-generation
tags:
- sql
- query
- database
---

## Model Details

### Model Description

This model is based on the declare-lab/flan-alpaca-base model finetuned with knowrohit07/know_sql dataset.

- **Developed by:** Jonathan Jordan
- **Model type:** FLAN Alpaca
- **Language(s) (NLP):** English
- **License:** [More Information Needed]
- **Finetuned from model:** declare-lab/flan-alpaca-base

## Uses

The model generates a string of SQL query based on a question and MySQL table schema. 
You can modify the table schema to match MySQL table schema if you are using different type of SQL database (e.g. PostgreSQL, Oracle, etc).
The generated SQL query can be run perfectly on the python SQL connection (e.g. psycopg2, mysql_connector, etc).

#### Limitations
1. The question MUST be in english
2. Keep in mind about the difference in data type naming between MySQL and the other SQL databases
3. The output always starts with SELECT *, you can't choose which columns to retrieve.
4. Aggregation function is not supported

### Input Example
```python
"""Question: what is What was the result of the election in the Florida 18 district?\nTable: table_1341598_10 (result VARCHAR, district VARCHAR)\nSQL: """
```
### Output Example
```python
"""SELECT * FROM table_1341598_10 WHERE district = "Florida 18""""
```

### How to use
Load model

```python
from peft import get_peft_config, get_peft_model, TaskType
from peft import PeftConfig, PeftModel
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_id = "jonathanjordan21/flan-alpaca-base-finetuned-lora-knowSQL"
config = PeftConfig.from_pretrained(model_id)
model_ = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, return_dict=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

model = PeftModel.from_pretrained(model_, model_id)
```

Model inference

```python
question = "server of user id 11 with status active and server id 10"
table = "table_name_77 ( user id INTEGER, status VARCHAR, server id INTEGER )"

test = f"""Question: {question}\nTable: {table}\nSQL: """

p = tokenizer(test, return_tensors='pt')

device = "cuda" if torch.cuda.is_available() else "cpu"
out = model.to(device).generate(**p.to(device),max_new_tokens=50)

print("SQL Query :", tokenizer.batch_decode(out,skip_special_tokens=True)[0])

```

## Performance

### Speed Performance
The model inference takes about 2-3 seconds to run in Google Colab Free Tier CPU

### Framework versions


- PEFT 0.6.2