Rajiv Shah

rajistics

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago
minishlab/semantic-deduplication
updated a Space about 2 months ago
rajistics/llamavision
View all activity

Articles

Organizations

rajistics's activity

updated a Space about 2 months ago
Reacted to merve's post with โค๏ธ 9 months ago
view post
Post
New foundation model on document understanding and generation in transformers ๐Ÿคฉ
UDOP by MSFT is a bleeding-edge model that is capable of many tasks, including question answering, document editing and more! ๐Ÿคฏ
Demo ๐Ÿ‘‰ merve/UDOP
It is a model that combines vision, text and layout. ๐Ÿ“
This model is very interesting because the input representation truly captures the nature of the document modality: text, where the text is, and the layout of the document matters!
If you know T5, it resembles that: it's pre-trained on both self-supervised and supervised objectives over text, image and layout.
To switch between tasks, one simply needs to change the task specific prompt at the beginning, e.g. for QA, one prepends with Question answering.
As for the architecture, it's like T5, except it has a single encoder that takes in text, image and layout, and two decoders (text-layout and vision decoders) combined into one.
The vision decoder is a masked autoencoder (thus the capabilities of document editing).
For me, the most interesting capability is document reconstruction, document editing and layout re-arrangement. This decoder isn't released though because it could be used maliciously to fake document editing.
Overall, the model performs very well on document understanding benchmark (DUE) and also information extraction (FUNSD, CORD) and classification (RVL-CDIP) for vision, text, layout modalities.
You can learn more about the model from below resources (h/t to
@nielsr ), thanks a lot for reading ๐Ÿค—
Docs: https://huggingface.co/docs/transformers/main/en/model_doc/udop ๐Ÿ“š
Checkpoints: microsoft/udop-65e625124aee97415b88b513
Demo notebooks: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/UDOP ๐Ÿ“•
replied to tomaarsen's post 9 months ago
view reply

This is so cool, thanks for raising attention to these models

Reacted to tomaarsen's post with โค๏ธ 9 months ago
view post
Post
I remember very well that about two years ago, 0-shot named entity recognition (i.e. where you can choose any labels on the fly) was completely infeasible. Fast forward a year, and Universal-NER/UniNER-7B-all surprised me by showing that 0-shot NER is possible! However, I had a bunch of concerns that prevented me from ever adopting it myself. For example, the model was 7B parameters, only worked with 1 custom label at a time, and it had a cc-by-nc-4.0 license.

Since then, a little known research paper introduced GLiNER, which was a modified & finetuned variant of the microsoft/deberta-v3-base line of models. Notably, GLiNER outperforms UniNER-7B, despite being almost 2 orders of magnitude smaller! It also allows for multiple labels at once, supports nested NER, and the models are Apache 2.0.

Very recently, the models were uploaded to Hugging Face, and I was inspired to create a demo for the English model. The demo runs on CPU, and can still very efficiently compute labels with great performance. I'm very impressed at the models.

There are two models right now:
* base (english): urchade/gliner_base
* multi (multilingual): urchade/gliner_multi

And my demo to experiment with the base model can be found here: https://huggingface.co/spaces/tomaarsen/gliner_base
ยท
New activity in databricks/databricks-dolly-15k 9 months ago

Instructing Tuning Tag

2
#15 opened 9 months ago by rajistics
New activity in CultriX/MistralTrix-v1 11 months ago

Congrats!

10
#3 opened 11 months ago by mlabonne
liked a Space about 1 year ago