@Ihor on Hugging Face: "We are pleased to announce the new line of universal token classification…"

Post

1898

We are pleased to announce the new line of universal token classification models 🔥

knowledgator/universal-token-classification-65a3a5d3f266d20b2e05c34d

It can perform various information extraction tasks by analysing input prompts and recognizing parts of texts that satisfy prompts. In comparison with the first version, the second one is more general and can be recognised as entities, whole sentences, and even paragraphs.

The model can be used for the following tasks:
* Named entity recognition (NER);
* Open information extraction;
* Question answering;
* Relation extraction;
* Coreference resolution;
* Text cleaning;
* Summarization;

How to use:

from utca.core import (
    AddData,
    RenameAttribute,
    Flush
)
from utca.implementation.predictors import (
    TokenSearcherPredictor, TokenSearcherPredictorConfig
)
from utca.implementation.tasks import (
    TokenSearcherNER,
    TokenSearcherNERPostprocessor,
)
predictor = TokenSearcherPredictor(
    TokenSearcherPredictorConfig(
        device="cuda:0",
        model="knowledgator/UTC-DeBERTa-base-v2"
    )
)
ner_task = TokenSearcherNER(
    predictor=predictor,
    postprocess=[TokenSearcherNERPostprocessor(
        threshold=0.5
    )]
)

ner_task = TokenSearcherNER()

pipeline = (        
    AddData({"labels": ["scientist", "university", "city"]})         
    | ner_task
    | Flush(keys=["labels"])
    | RenameAttribute("output", "entities")
)
res = pipeline.run({
    "text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience". """
})

Join the conversation