Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model

Fine-tuned mt5-base model for resolving elliptical coordinated compound noun phrases (ECCNPs) in German text. ECCNPs are are special type of coordination ellipses, where a part of a compound noun is omitted due to coordination (e.g., "and", "or", "/").

For instance, Chemo- und Strahlentherapie (chemo- and radiotherapy) is the elliptical form of Chemotherapie und Strahlentherapie (chemotherapy and radiotherapy).

Dataset

The model has been fine-tuned with a subset of sentences of GGPONC 2.0 containing manually annotated ECCNPs and their resolution. The annotated dataset is available on Zenodo: https://zenodo.org/records/12529883

Usage

The model can be loaded as a Text2TextGenerationPipeline:

from transformers import pipeline
pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base")
pipe("Chemo- und Strahlentherapie")
>>> [{'generated_text': 'Chemotherapie und Strahlentherapie'}]
pipe("Vitamin C, E und A")
>>> [{'generated_text': 'Vitamin C, Vitamin E und Vitamin A'}]

It is recommended to set max_length to control the maximum output length. For most German sentences, a value of 256 should be enough:

pipe = pipeline(model="phlobo/german-ellipses-resolver-mt5-base", max_length=256)

Paper

Our approach and its evaluation have been published at the ACL BioNLP'23 workshop.

Please cite the following paper if you find our model useful:

@inproceedings{kammer-etal-2023-resolving,
    title = "Resolving Elliptical Compounds in {G}erman Medical Text",
    author = "Kammer, Niklas  and
      Borchert, Florian  and
      Winkler, Silvia  and
      de Melo, Gerard  and
      Schapranow, Matthieu-P.",
    editor = "Demner-fushman, Dina  and
      Ananiadou, Sophia  and
      Cohen, Kevin",
    booktitle = "The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.bionlp-1.26",
    doi = "10.18653/v1/2023.bionlp-1.26",
    pages = "292--305"
}
Downloads last month
9