Software Citation Intent Classifier

The Software Citation Intent Classifier (soft-cite-intent-cls) can be used to predict the citation or "reference" intent behind a text reference or citation to a piece of software in an academic article.

Possible values include:

used
mentioned
created
other

For example, given the sentence: "The XYZ code and software, with an example input dataset and detailed instructions are available from GitHub (https://github.com/user/repo)" should be predicted as "created" as the authors are directly referencing their own created code.

The specific name of the code and software and username and repository have been removed for privacy.

In comparison, the sentence: "For the statistical analyses of the data in this study, the Statistical Package for the Social Sciences (SPSS) version 22 (IBM Corp, Armonk, New York) was used" should be predicted as "used" as the authors are directly informing the reader that this is not their own software but rather software they used for analysis.

This was originally created during the CZI Software Impact Hackathon by Ana-Maria Istrate, Joshua Fisher, Xinyu Yang, Kara Moraw, Kai Li, Donghui Li, and Martin Klein.

Their original work can be found in the SoftwareCitationIntent Repository.

Eva Maxfield Brown recreated and uploaded this version of the model to Huggingface Hub for her own work, her scripts for recreating this model can be found in the grobid-soft-proc repository.

Model Details

Model Description

Developed by: Originally created by Ana-Maria Istrate, Joshua Fisher, Xinyu Yang, Kara Moraw, Kai Li, Donghui Li, and Martin Klein.
Made Available by Eva Maxfield Brown
Language(s) (NLP): en (English)
License: MIT
Finetuned from model [optional]: microsoft/deberta-v3-base

Model Sources [optional]

Repository: Original Repository, Distribution Repository
Paper [optional]: Scientific Software Citation Intent Classification using Large Language Models

Training Details

Training Data

HuggingFace Dataset: soft-cite-intent
CSV from Repo: soft-cite-intent

Training Procedure

Training Script: train-and-upload-best

Results

Accuracy: 0.916
Precision: 0.916
Recall: 0.916
F1: 0.916

Citation [optional]

See their paper Scientific Software Citation Intent Classification using Large Language Models at NSLP2024.