|
--- |
|
library_name: peft |
|
base_model: mistralai/Mixtral-8x7B-v0.1 |
|
license: apache-2.0 |
|
datasets: |
|
- knowrohit07/know_sql |
|
language: |
|
- en |
|
--- |
|
|
|
<!--![](project-9.png)--> |
|
<img src="project-9.png" width="50%" height="50%" > |
|
|
|
|
|
## SQL-Converter Mixtral 8x7B v0.1 |
|
|
|
**Convert Natural Language to SQL** |
|
|
|
### Overview |
|
Mixtral-8x7B-sql-ft-v1 is fine-tuned from Mixtral 8x7B to convert natural language to SQL queries. |
|
|
|
### Base Model |
|
mistralai/Mixtral-8x7B-v0.1 |
|
|
|
### Fine-Tuning |
|
- **Dataset**: 5,000 natural language-SQL pairs. |
|
|
|
### Usage |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
from peft import PeftModel |
|
import torch |
|
|
|
base_model_id = 'mistralai/Mixtral-8x7B-v0.1' |
|
adapter_id = 'sharadsin/Mixtral-8x7B-sql-ft-v1' |
|
|
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit = True, |
|
bnb_4bit_use_double_quant = True, |
|
bnb_4bit_compute_dtype = torch.bfloat16, |
|
bnb_4bit_quant_type = "nf4", |
|
) |
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
base_model_id, |
|
quantization_config = bnb_config, |
|
device_map = "auto", |
|
trust_remote_code = True, |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token = True, trust_remote_code = True) |
|
|
|
ft_model = PeftModel.from_pretrained(base_model, adapter_id) |
|
|
|
eval_prompt= """SYSTEM: Use the following contextual information to concisely answer the question. |
|
|
|
USER: CREATE TABLE EmployeeInfo (EmpID INTEGER, EmpFname VARCHAR, EmpLname VARCHAR, Department VARCHAR, Project VARCHAR,Address VARCHAR, DOB DATE, Gender CHAR) |
|
=== |
|
Write a query to fetch details of employees whose EmpLname ends with an alphabet 'A' and contains five alphabets? |
|
ASSISTANT:""" |
|
|
|
model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda") |
|
|
|
ft_model.eval() |
|
with torch.inference_mode(): |
|
print(tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=70,top_k=4, penalty_alpha = 0.6, repetition_penalty=1.15)[0], skip_special_tokens= False)) |
|
|
|
``` |
|
|
|
### Limitations |
|
- Less accurate with very complex queries. |
|
- Generates extra gibberish content after providing the answers. |
|
|
|
|
|
### Framework versions |
|
|
|
- PEFT 0.7.1 |