Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Table of Contents

Model description

mbert-base-cased-NER-NL-legislation-refs is a fine-tuned BERT model that was trained to recognize the entity type 'legislation references' (REF) in Dutch case law.

Specifically, this model is a bert-base-multilingual-cased model that was fine-tuned on the mbert-base-cased-NER-NL-legislation-refs-data dataset.

Training procedure

Dataset

This model was fine-tuned on the mbert-base-cased-NER-NL-legislation-refs-data dataset. This dataset consists of 512 token long examples which each contain one or more legislation references. These examples were created from a weakly labelled corpus of Dutch case law which was scraped from Linked Data Overheid, pre-tokenized and labelled (biluo_tags_from_offsets) through spaCy and further tokenized through applying Hugging Face's AutoTokenizer.from_pretrained() for bert-base-multilingual-cased's tokenizer.

Results

Model Precision Recall F1-score
mBERT 0.891 0.919 0.905

Using Hugging Face's hosted inference API widget this model can be quickly tested on the provided examples. Note that the hosted inference API widget incorrectly presents the last token of a legislation reference as a seperate entity due to the workings of its 'simple' aggregation_strategy. While this model was fine-tuned on training data labelled in accordence with the BILOU scheme, the hosted inference API groups entities by merging B- and I- tags when the tag is similar (thereby omitting the L- tags).

Limitations and biases

More information needed

BibTeX entry and citation info

More information needed

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train romjansen/mbert-base-cased-NER-NL-legislation-refs

Evaluation results

  • precision on romjansen/mbert-base-cased-NER-NL-legislation-refs-data
    self-reported
    0.891
  • recall on romjansen/mbert-base-cased-NER-NL-legislation-refs-data
    self-reported
    0.919
  • F1-score on romjansen/mbert-base-cased-NER-NL-legislation-refs-data
    self-reported
    0.905