munish0838 commited on
Commit
3595aa8
1 Parent(s): d4d01e1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +205 -0
README.md ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ datasets:
6
+ - PipableAI/pip-txt-to-sql-spider-bird-dataset
7
+ language:
8
+ - en
9
+ metrics:
10
+ - accuracy
11
+ tags:
12
+ - sql
13
+ - code
14
+ - text2sql
15
+ - instruction_tuned
16
+ - basemodel
17
+ - jax
18
+ - pytorch
19
+ - text-generation-inference
20
+ library_name: transformers
21
+ pipeline_tag: text-generation
22
+ widget:
23
+ - text: >-
24
+ <schema>CREATE TABLE system(JobID: String,GID: String, UID: String,
25
+ Start:Time(yyyy/mm/dd), End: Time,ElapsedRaw: Time, CPUTimeRAW: Time,NCPUS:
26
+ Number,NNodes: Number, NodeList: List, State:String, Timelimit:
27
+ Time);</schema><question>Get UID and job id for Jobs that started on Jan 20
28
+ , 2023 ended on feb 14 2023 and has job id 20</question><sql>
29
+ example_title: example
30
+
31
+ ---
32
+
33
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
34
+
35
+
36
+ # QuantFactory/pip-sql-1.3b-GGUF
37
+ This is quantized version of [PipableAI/pip-sql-1.3b](https://huggingface.co/PipableAI/pip-sql-1.3b) created using llama.cpp
38
+
39
+ # Original Model Card
40
+
41
+ # pipSQL-1.3b
42
+
43
+ [pipableAi](https://www.linkedin.com/company/pipable.ai/about/)
44
+
45
+ [colab_notebook](https://colab.research.google.com/drive/1insSxvc3jjAXe0zmdIjmbG3ttb5mpRgQ?usp=sharing)
46
+
47
+ ## What have we built?
48
+ A 1.3 bn SQL model that outperforms most SQL expert models and chatgpt on popular benchmarks.
49
+ This is a distilled model built on the deepseek base model.
50
+ Please refer to https://huggingface.co/PipableAI/pip-library-etl-1.3b for our state of the art model.
51
+ ## How we built it?
52
+
53
+ We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up.
54
+ Loss behaviour in the set up mentioned above -
55
+
56
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/658d8095a2a6a6e0da8bb8a6/I80Ru1r4thoYrLagIWALa.png)
57
+
58
+ ## Benchmarking :
59
+ For benchmarking purposes we are using Semantic Evaluation for Text-to-SQL with
60
+ Distilled Test Suites, an officially accepted evaluation framework for Spider, SParC, and CoSQL which was proposed by a research team of Yale and Berkeley.
61
+ The benchmark contains 2200 test data points
62
+ Here is the link to run the evaluation:
63
+
64
+
65
+ [Test Suite SQL Eval](https://github.com/taoyds/test-suite-sql-eval)
66
+
67
+ |model|easy|medium|hard|extra|
68
+ |-----|----|------|----|-----|
69
+ |sqlcoder-7b-2|72.0|58.0|40.6|37.3|
70
+ |pipSQL-1.3b|78.5|57.5|42.1|28.3|
71
+ |pipSQL-7b|63.0|40.0|30.2|25.0|
72
+ |sqlcoder-7b|60.6|48.2|28.3|20.4|
73
+ |gpt-3.5|58.8|44.7|31.0|28.4|
74
+
75
+ We have also benchmarked it on defog eval.
76
+ It contains 200 test data points handpicked by defog team.
77
+ Here is the link to it:
78
+
79
+
80
+ [Defog SQL-Eval](https://github.com/defog-ai/sql-eval)
81
+ These are the results -
82
+
83
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64d32c6b921678fdc9de3302/fFeLSEYBNpQk_JWjFsF5M.png)
84
+
85
+ ## License
86
+ The model is open source under apache 2.0. License
87
+
88
+ ## Usage
89
+
90
+ ### Installation
91
+
92
+ ```bash
93
+ pip install transformers
94
+ ```
95
+
96
+ ### Prompt
97
+ ```python
98
+ prompt = f"""<schema>{schema}</schema>
99
+ <question>{question}</question>
100
+ <sql>"""
101
+ ```
102
+
103
+ ### PyTorch
104
+ ```python
105
+ from transformers import AutoModelForCausalLM, AutoTokenizer
106
+ device = "cuda"
107
+ model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-sql-1.3b")
108
+ tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-sql-1.3b")
109
+
110
+ inputs = tokenizer(text, return_tensors="pt")
111
+ outputs = model.generate(**inputs, max_new_tokens=200)
112
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0])
113
+ ```
114
+
115
+ ### Flax
116
+ ```python
117
+ from transformers import FlaxAutoModelForCausalLM, AutoTokenizer
118
+ device = "cuda"
119
+ model = FlaxAutoModelForCausalLM.from_pretrained("PipableAI/pip-sql-1.3b",from_pt=True)
120
+ tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-sql-1.3b")
121
+
122
+ inputs = tokenizer(text, return_tensors="jax")
123
+ outputs = model.generate(**inputs, max_new_tokens=200)
124
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True).split('<sql>')[1].split('</sql>')[0])
125
+ ```
126
+
127
+ ## Examples
128
+
129
+ ### Schema
130
+ ```sql
131
+ CREATE TABLE Products (
132
+ product_id number,
133
+ parent_product_id number,
134
+ product_name text,
135
+ product_price number,
136
+ product_color text,
137
+ product_size text,
138
+ product_description text);
139
+
140
+ CREATE TABLE Customers (
141
+ customer_id number,
142
+ gender_code text,
143
+ customer_first_name text,
144
+ customer_middle_initial text,
145
+ customer_last_name text,
146
+ email_address text,
147
+ login_name text,
148
+ login_password text,
149
+ phone_number text,
150
+ address_line_1 text,
151
+ town_city text,
152
+ county text,
153
+ country text);
154
+
155
+ CREATE TABLE Customer_Payment_Methods (
156
+ customer_id number,
157
+ payment_method_code text);
158
+
159
+ CREATE TABLE Invoices (
160
+ invoice_number number,
161
+ invoice_status_code text,
162
+ invoice_date time);
163
+
164
+ CREATE TABLE Orders (
165
+ order_id number,
166
+ customer_id number,
167
+ order_status_code text,
168
+ date_order_placed time);
169
+
170
+ CREATE TABLE Order_Items (
171
+ order_item_id number,
172
+ product_id number,
173
+ order_id number,
174
+ order_item_status_code text);
175
+
176
+ CREATE TABLE Shipments (
177
+ shipment_id number,
178
+ order_id number,
179
+ invoice_number number,
180
+ shipment_tracking_number text,
181
+ shipment_date time);
182
+
183
+ CREATE TABLE Shipment_Items (
184
+ shipment_id number,
185
+ order_item_id number);
186
+ ```
187
+
188
+ ### Questions
189
+ What are the email address, town and county of the customers who are of the least common gender?
190
+ ```sql
191
+ SELECT email_address , town_city , county FROM customers GROUP BY gender_code ORDER BY count(*) ASC LIMIT 1
192
+ ```
193
+
194
+ What are the product price and the product size of the products whose price is above average?
195
+ ```sql
196
+ SELECT product_price , product_size FROM products WHERE product_price > (SELECT avg(product_price) FROM products)
197
+ ```
198
+
199
+ Which customers did not make any orders? List the first name, middle initial and last name.
200
+ ```sql
201
+ SELECT T1.customer_first_name , T1.customer_middle_initial , T1.customer_last_name FROM Customers AS T1 WHERE T1.customer_id NOT IN (SELECT T2.customer_id FROM Orders AS T2)
202
+ ```
203
+
204
+ ### Team
205
+ Avi Kothari, Pratham Gupta, Ritvik Aryan Kalra, Rohan Bhatial, Soham Acharya