Stepanov

Ihor

AI & ML interests

Text classification, computational biology, relations extraction, path reasoning

Organizations

Ihor's activity

posted an update 10 days ago
view post
Post
511
We’re thrilled to share our latest technical paper on the multi-task GLiNER model. Our research dives into the following exciting and forward-thinking topics:

πŸ” Zero-shot NER & Information Extraction: We demonstrate that with diverse and ample data, paired with the right architecture, encoders can achieve impressive results across various extraction tasks;

πŸ› οΈ Synthetic Data Generation: Leveraging open labelling by LLMs like Llama, we generated high-quality training data. Our student model even outperformed the teacher model, highlighting the potential of this approach.

πŸ€– Self-Learning: Our model showed consistent improvements in performance without labelled data, achieving up to a 12% increase in F1 score for initially challenging topics. This ability to learn and improve autonomously is a very perspective direction of future research!

GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks (2406.12925)
knowledgator/gliner-multitask-large-v0.5
knowledgator/GLiNER_HandyLab


#!pip install gliner -U

from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner-multitask-large-v0.5")

text = """
Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800. 
"""

labels = ["founder", "computer", "software", "position", "date"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

posted an update 27 days ago
view post
Post
781
We are super happy to contribute to the GLiNER ecosystem by optimizing training code and releasing a multi-task, prompt-tunable model.

The model can be used for the following tasks:
* Named entity recognition (NER);
* Open information extraction;
* Question answering;
* Relation extraction;
* Summarization;

Model: knowledgator/gliner-multitask-large-v0.5
Demo: knowledgator/GLiNER_HandyLab
Repo: πŸ‘¨β€πŸ’» https://github.com/urchade/GLiNER

**How to use**
First of all, install gliner package.

pip install gliner

Then try the following code:
from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner_small-v2.1")

prompt = """Find all positive aspects about the product:\n"""
text = """
I recently purchased the Sony WH-1000XM4 Wireless Noise-Canceling Headphones from Amazon and I must say, I'm thoroughly impressed. The package arrived in New York within 2 days, thanks to Amazon Prime's expedited shipping.

The headphones themselves are remarkable. The noise-canceling feature works like a charm in the bustling city environment, and the 30-hour battery life means I don't have to charge them every day. Connecting them to my Samsung Galaxy S21 was a breeze, and the sound quality is second to none.
I also appreciated the customer service from Amazon when I had a question about the warranty. They responded within an hour and provided all the information I needed.
However, the headphones did not come with a hard case, which was listed in the product description. I contacted Amazon, and they offered a 10% discount on my next purchase as an apology.
Overall, I'd give these headphones a 4.5/5 rating and highly recommend them to anyone looking for top-notch quality in both product and service.
"""
input_ = prompt+text

labels = ["match"]

matches = model.predict_entities(input_, labels)

for match in matches:
    print(match["text"], "=>", match["score"])

posted an update about 1 month ago
view post
Post
1895
We are pleased to announce the new line of universal token classification models πŸ”₯

knowledgator/universal-token-classification-65a3a5d3f266d20b2e05c34d

It can perform various information extraction tasks by analysing input prompts and recognizing parts of texts that satisfy prompts. In comparison with the first version, the second one is more general and can be recognised as entities, whole sentences, and even paragraphs.

The model can be used for the following tasks:
* Named entity recognition (NER);
* Open information extraction;
* Question answering;
* Relation extraction;
* Coreference resolution;
* Text cleaning;
* Summarization;

How to use:

from utca.core import (
    AddData,
    RenameAttribute,
    Flush
)
from utca.implementation.predictors import (
    TokenSearcherPredictor, TokenSearcherPredictorConfig
)
from utca.implementation.tasks import (
    TokenSearcherNER,
    TokenSearcherNERPostprocessor,
)
predictor = TokenSearcherPredictor(
    TokenSearcherPredictorConfig(
        device="cuda:0",
        model="knowledgator/UTC-DeBERTa-base-v2"
    )
)
ner_task = TokenSearcherNER(
    predictor=predictor,
    postprocess=[TokenSearcherNERPostprocessor(
        threshold=0.5
    )]
)

ner_task = TokenSearcherNER()

pipeline = (        
    AddData({"labels": ["scientist", "university", "city"]})         
    | ner_task
    | Flush(keys=["labels"])
    | RenameAttribute("output", "entities")
)
res = pipeline.run({
    "text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience". """
})