SchemaPile Foreign Key Detection Model (Starcoder)

Model Description

In this repository we are introducing starcoder-schemapile-fk. It's a language model, based on BigCode/starcoder fine-tuned for predicting foreign key relationships in relational database schemas.

Training Data

Forein key pairs extracted from SchemaPile-Perm, a large collection of relational database schemas.

Evaluation Data

We evaluate the foreign key detection accuracy of starcoder-schemapile-fk and t5-schemapile-fk on schemas from Spider, BIRD-SQL, and CTU PRLR.

eval

Training Procedure

The model was trained, using 4x A100 40GB GPUs with DeepSpeed ZeRO-3 offloading, and following hyperparamters:

  • learning_rate: 2.0e-05
  • num_train_epochs: 3
  • gradient_accumulation_steps: 8
  • per_device_train_batch_size: 4
  • bf16: true
  • warmup_ratio: 0.03
  • weight_decay: 0.0

See Training Code.

How to Use

We recommend using the provided prompt template and constrained output using jsonformer:

Example Prompt:

You are given the following SQL database tables: 
staff(staff_id, staff_address_id, nickname, first_name, middle_name, last_name, date_of_birth, date_joined_staff, date_left_staff)
addresses(address_id, line_1_number_building, city, zip_postcode, state_province_county, country)
Output a json string with the following schema {table, column, referencedTable, referencedColumn} that contains the foreign key relationship between the two tables.

Example Output:

{'table': 'staff',
 'column': 'staff_address_id',
 'referencedTable': 'addresses',
 'referencedColumn': 'address_id'}

To run the model locally, we recommend using our end-to-end Example Notebook (requires a single A100 40GB).

Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.