Meta-Llama 3.1 8B Text-to-SQL GPTQ Model

This repository provides a quantized 8-billion-parameter Meta-Llama model fine-tuned for text-to-SQL tasks. The model is optimized with GPTQ quantization for efficient inference. Below you'll find instructions to load, use, and fine-tune the model.

Model Details

  • Model Size: 8B
  • Quantization: GPTQ (4-bit)
  • Languages Supported: English, Italian
  • Task: Text-to-SQL generation
  • License: Apache 2.0

Installation Requirements

Before using the model, ensure that you have the following dependencies installed. We recommend using the same versions to avoid any compatibility issues.

# Install the required PyTorch version with CUDA support (ensure CUDA 12.1 is installed)
!pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

# Install AutoGPTQ for quantized model handling
!pip install auto-gptq --no-build-isolation

# Install Optimum for model optimization
!pip install optimum

After installing the dependencies, reset your instance to ensure everything works correctly.

Loading the Model

To load the quantized Meta-Llama 3.1 model and use it for text-to-SQL tasks, use the following Python code:

from transformers import AutoTokenizer, pipeline
from auto_gptq import AutoGPTQForCausalLM
import torch

# Define the Alpaca-style prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
"""

# Model directory and tokenizer
quantized_model_dir = "meta-llama-8b-quantized-4bit"  # Path where quantized model is saved
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

# Load the quantized model
model = AutoGPTQForCausalLM.from_quantized(
    quantized_model_dir,
    device_map="auto",  # Automatically map the model to the available device (GPU or CPU)
    torch_dtype=torch.float16,  # Ensure FP16 for efficiency
    use_safetensors=True  # If you saved the model using safetensors format, set this to True
)

# Set up the text generation pipeline without specifying the device
pipeline = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer
)

# Function to generate SQL query from input text using the Alpaca prompt
def generate_sql(input_text):
    # Format the prompt
    prompt = alpaca_prompt.format(
        "Provide the SQL query",
        input_text
    )

    # Generate the response using the pipeline
    generated_text = pipeline(
        prompt, 
        max_length=200, 
        eos_token_id=tokenizer.eos_token_id
    )[0]["generated_text"]

    # Clean the output by removing the prompt and any extra newlines
    cleaned_output = generated_text.replace(prompt, '').strip()

    return cleaned_output

# Example usage
italian_input = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
sql_query = generate_sql(italian_input)
print(sql_query)

Example Usage

The example script shows how to generate SQL queries from natural language text. Simply provide a request in Italian or English, and the model will generate an appropriate SQL query.

Example input:

italian_input = "Seleziona tutte le colonne della tabella table1 dove la colonna anni è uguale a 2020"
sql_query = generate_sql(italian_input)
print(sql_query)

Example output:

SELECT * FROM table1 WHERE anni = 2020;

Model Tags

  • text-generation-inference
  • transformers
  • llama
  • trl
  • sft

License

This model is released under the Apache License 2.0.

Downloads last month
6
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.