34 1 67

Rajiv Shah

rajistics

https://www.rajivshah.com

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago

minishlab/semantic-deduplication

updated a Space about 2 months ago

rajistics/llamavision

View all activity

Articles

Organizations

rajistics's activity

liked a Space about 1 month ago

Sleeping

🧹

Semantic Deduplication

Deduplicate HuggingFace datasets in seconds

updated a Space about 2 months ago

Running

🏃

Llamavision

liked a Space 2 months ago

Running on Zero

🔍➡️📝

ColPali fine-tuning Query Generator

liked a model 4 months ago

FantasticGNU/AnomalyGPT

Updated Aug 30, 2023 • 22

updated a model 6 months ago

rajistics/mpnet-base-all-nli-triplet

updated a Space 6 months ago

Runtime error

📊

Receipt Extractor

Reacted to merve's post with ❤️ 9 months ago

Post

New foundation model on document understanding and generation in transformers 🤩
UDOP by MSFT is a bleeding-edge model that is capable of many tasks, including question answering, document editing and more! 🤯
Demo 👉 merve/UDOP
It is a model that combines vision, text and layout. 📝
This model is very interesting because the input representation truly captures the nature of the document modality: text, where the text is, and the layout of the document matters!
If you know T5, it resembles that: it's pre-trained on both self-supervised and supervised objectives over text, image and layout.
To switch between tasks, one simply needs to change the task specific prompt at the beginning, e.g. for QA, one prepends with Question answering.
As for the architecture, it's like T5, except it has a single encoder that takes in text, image and layout, and two decoders (text-layout and vision decoders) combined into one.
The vision decoder is a masked autoencoder (thus the capabilities of document editing).
For me, the most interesting capability is document reconstruction, document editing and layout re-arrangement. This decoder isn't released though because it could be used maliciously to fake document editing.
Overall, the model performs very well on document understanding benchmark (DUE) and also information extraction (FUNSD, CORD) and classification (RVL-CDIP) for vision, text, layout modalities.
You can learn more about the model from below resources (h/t to
@nielsr ), thanks a lot for reading 🤗
Docs: https://huggingface.co/docs/transformers/main/en/model_doc/udop 📚
Checkpoints: microsoft/udop-65e625124aee97415b88b513
Demo notebooks: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/UDOP 📕

replied to tomaarsen's post 9 months ago

This is so cool, thanks for raising attention to these models

Reacted to tomaarsen's post with ❤️ 9 months ago

Post

I remember very well that about two years ago, 0-shot named entity recognition (i.e. where you can choose any labels on the fly) was completely infeasible. Fast forward a year, and Universal-NER/UniNER-7B-all surprised me by showing that 0-shot NER is possible! However, I had a bunch of concerns that prevented me from ever adopting it myself. For example, the model was 7B parameters, only worked with 1 custom label at a time, and it had a cc-by-nc-4.0 license.

Since then, a little known research paper introduced GLiNER, which was a modified & finetuned variant of the microsoft/deberta-v3-base line of models. Notably, GLiNER outperforms UniNER-7B, despite being almost 2 orders of magnitude smaller! It also allows for multiple labels at once, supports nested NER, and the models are Apache 2.0.

Very recently, the models were uploaded to Hugging Face, and I was inspired to create a demo for the English model. The demo runs on CPU, and can still very efficiently compute labels with great performance. I'm very impressed at the models.

There are two models right now:
* base (english): urchade/gliner_base
* multi (multilingual): urchade/gliner_multi

And my demo to experiment with the base model can be found here: https://huggingface.co/spaces/tomaarsen/gliner_base