kyleluoma/SNAILS-word-naturalness-classifier

This is an artifact of the SNAILS project. We finetuned google/canine-s to perform the task of word naturalness classification. Full (unabbreviated) words are "Regular" naturalness (labeled as N1). Somewhat abbreviated words are "Low" naturalness (labeled as N2). Very abbreviated or indecipherable words are "Least" naturalness (labeled as N3).

Inference using this model requires a token tagging pre-processing step. This is provided in tokenprocessing.py. To most easily use this model, download the snails_naturalness_classifier.py and tokenprocessing.py files in this repository and run snails_naturalness_classifier.py.

For more information about the SNAILS project and to access the training data: GitHub repository: https://www.github.com/KyleLuoma/SNAILS

Read the paper: https://dl.acm.org/doi/10.1145/3709727

Citing this model:

@article{10.1145/3709727,
author = {Luoma, Kyle and Kumar, Arun},
title = {SNAILS: Schema Naming Assessments for Improved LLM-Based SQL Inference},
year = {2025},
issue_date = {February 2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {3},
number = {1},
url = {https://doi.org/10.1145/3709727},
doi = {10.1145/3709727},
journal = {Proc. ACM Manag. Data},
month = feb,
articleno = {77},
numpages = {26},
}

kyleluoma
/

SNAILS-word-naturalness-classifier

Model tree for kyleluoma/SNAILS-word-naturalness-classifier