bertimbau-large-ner-total

This model card aims to simplify the use of the portuguese Bert, a.k.a, Bertimbau for the Named Entity Recognition task.

For this model card the we used the BERT-CRF (total scenario, 10 classes) model available in the ner_evaluation folder of the original Bertimbau repo.

Available classes are:

  • PESSOA
  • ORGANIZACAO
  • LOCAL
  • TEMPO
  • VALOR
  • ABSTRACCAO
  • ACONTECIMENTO
  • COISA
  • OBRA
  • OUTRO

Usage

# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("marquesafonso/bertimbau-large-ner-total")
model = AutoModelForTokenClassification.from_pretrained("marquesafonso/bertimbau-large-ner-total")

Example

from transformers import pipeline

pipe = pipeline("ner", model="marquesafonso/bertimbau-large-ner-total", aggregation_strategy='simple')

sentence = "James Marsh, realizador de filmes como A Teoria de Tudo ou Homem no Arame, assumiu a missão de criar uma obra biográfica sobre Samue Beckett, figura ímpar da literatura e da dramaturgia do século XX. O guião foi escrito pelo escocês Neil Forsyth, vencedor de dois Baftas."

result = pipe([sentence])

print(f"{sentence}\n{result}")

# James Marsh, realizador de filmes como A Teoria de Tudo ou Homem no Arame, assumiu a missão de criar uma obra biográfica sobre Samue Beckett, figura ímpar da literatura e da dramaturgia do século XX. O guião foi escrito pelo escocês Neil Forsyth, vencedor de dois Baftas.
# [[
#     {'entity_group': 'PESSOA', 'score': 0.99737316, 'word': 'James Marsh', 'start': 0, 'end': 11},
#     {'entity_group': 'OBRA', 'score': 0.9823761, 'word': 'A Teoria de Tudo', 'start': 39, 'end': 55},
#     {'entity_group': 'OBRA', 'score': 0.96812135, 'word': 'Homem no Arame', 'start': 59, 'end': 73},
#     {'entity_group': 'PESSOA', 'score': 0.9954967, 'word': 'Samue Beckett', 'start': 127, 'end': 140},
#     {'entity_group': 'TEMPO', 'score': 0.97845674, 'word': 'século XX', 'start': 189, 'end': 198},
#     {'entity_group': 'PESSOA', 'score': 0.9962597, 'word': 'Neil Forsyth', 'start': 233, 'end': 245},
#     {'entity_group': 'OUTRO', 'score': 0.7552187, 'word': 'Baftas', 'start': 264, 'end': 270}
# ]]

Acknowledgements

This work is an adaptation of portuguese Bert, a.k.a, Bertimbau. You may check and/or cite their work:

@InProceedings{souza2020bertimbau,
    author="Souza, F{\'a}bio and Nogueira, Rodrigo and Lotufo, Roberto",
    editor="Cerri, Ricardo and Prati, Ronaldo C.",
    title="BERTimbau: Pretrained BERT Models for Brazilian Portuguese",
    booktitle="Intelligent Systems",
    year="2020",
    publisher="Springer International Publishing",
    address="Cham",
    pages="403--417",
    isbn="978-3-030-61377-8"
}


@article{souza2019portuguese,
    title={Portuguese Named Entity Recognition using BERT-CRF},
    author={Souza, F{\'a}bio and Nogueira, Rodrigo and Lotufo, Roberto},
    journal={arXiv preprint arXiv:1909.10649},
    url={http://arxiv.org/abs/1909.10649},
    year={2019}
}

Note that the authors - Fabio Capuano de Souza, Rodrigo Nogueira, Roberto de Alencar Lotufo - have used an MIT LICENSE for their work.

Downloads last month
36
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including marquesafonso/bertimbau-large-ner-total