Replicating DeepSeek R1 for Information Extraction
•
14
args.push_to_hub=True
and args.hub_model_id
to upload your model checkpoints to Hugging Face while training. It also uploads your emissions (if codecarbon is installed) and your Tensorboard logs (if tensorboard is installed)Thanks for sharing this development! Can you also write a blog or paper to understand it better? Thanks
https://blog.knowledgator.com/meet-the-new-zero-shot-ner-architecture-30ffc2cb1ee0
Yeah, we are working on it
#!pip install gliner -U
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/gliner-multitask-large-v0.5")
text = """
Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975 to develop and sell BASIC interpreters for the Altair 8800.
"""
labels = ["founder", "computer", "software", "position", "date"]
entities = model.predict_entities(text, labels)
for entity in entities:
print(entity["text"], "=>", entity["label"])
pip install gliner
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/gliner_small-v2.1")
prompt = """Find all positive aspects about the product:\n"""
text = """
I recently purchased the Sony WH-1000XM4 Wireless Noise-Canceling Headphones from Amazon and I must say, I'm thoroughly impressed. The package arrived in New York within 2 days, thanks to Amazon Prime's expedited shipping.
The headphones themselves are remarkable. The noise-canceling feature works like a charm in the bustling city environment, and the 30-hour battery life means I don't have to charge them every day. Connecting them to my Samsung Galaxy S21 was a breeze, and the sound quality is second to none.
I also appreciated the customer service from Amazon when I had a question about the warranty. They responded within an hour and provided all the information I needed.
However, the headphones did not come with a hard case, which was listed in the product description. I contacted Amazon, and they offered a 10% discount on my next purchase as an apology.
Overall, I'd give these headphones a 4.5/5 rating and highly recommend them to anyone looking for top-notch quality in both product and service.
"""
input_ = prompt+text
labels = ["match"]
matches = model.predict_entities(input_, labels)
for match in matches:
print(match["text"], "=>", match["score"])
from utca.core import (
AddData,
RenameAttribute,
Flush
)
from utca.implementation.predictors import (
TokenSearcherPredictor, TokenSearcherPredictorConfig
)
from utca.implementation.tasks import (
TokenSearcherNER,
TokenSearcherNERPostprocessor,
)
predictor = TokenSearcherPredictor(
TokenSearcherPredictorConfig(
device="cuda:0",
model="knowledgator/UTC-DeBERTa-base-v2"
)
)
ner_task = TokenSearcherNER(
predictor=predictor,
postprocess=[TokenSearcherNERPostprocessor(
threshold=0.5
)]
)
ner_task = TokenSearcherNER()
pipeline = (
AddData({"labels": ["scientist", "university", "city"]})
| ner_task
| Flush(keys=["labels"])
| RenameAttribute("output", "entities")
)
res = pipeline.run({
"text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience". """
})
from utca.core import (
AddData,
RenameAttribute,
Flush
)
from utca.implementation.predictors import (
TokenSearcherPredictor, TokenSearcherPredictorConfig
)
from utca.implementation.tasks import (
TokenSearcherNER,
TokenSearcherNERPostprocessor,
)
predictor = TokenSearcherPredictor(
TokenSearcherPredictorConfig(
device="cuda:0",
model="knowledgator/UTC-DeBERTa-base-v2"
)
)
ner_task = TokenSearcherNER(
predictor=predictor,
postprocess=[TokenSearcherNERPostprocessor(
threshold=0.5
)]
)
ner_task = TokenSearcherNER()
pipeline = (
AddData({"labels": ["scientist", "university", "city"]})
| ner_task
| Flush(keys=["labels"])
| RenameAttribute("output", "entities")
)
res = pipeline.run({
"text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience". """
})