Pclanglais/Crime-Studies-Deberta

Crime-Studies-Deberta is a classifier model that predict whether or not a scientific article belongs to crime studies.

Crime-Studies-Deberta is fine-tuned on French bibliographic sources from Isidore/Criminocorpus but has been proven to work on other languages.

Evaluation

On the val set, accuracy of Crime-Studies-Deberta was at 98.5%.

Use

A sample script for inference is provided in the model set (inference_deberta_batch.py), along with a demo dataset of 300k titles from Isidore.

Every query to Crime-Studies-Deberta is a combination of article title and journal title, separated by a newline.

The integration of the journal title has given a significant boost to the training results, as it helps to significantly disambiguate some generic article titles (like "introduction").

For every submitted entry, Crime-Studies-Deberta returns a probability to belong to crime studies.

This is convenient, since depending on the requirement of the studies (and the time spent for further verification) it is possible to leverage more false negative.