DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper • 2410.12628 • Published Oct 16 • 29
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 83
multilingual vision models Collection Some papers I read for understanding vision models and also adding multilingual capabilities to them • 14 items • Updated 19 days ago • 2
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published 20 days ago • 25
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 25 days ago • 57
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 26 days ago • 119
view article Article Enjoy the Power of Phi-3 with ONNX Runtime on your device By Emma-N • May 22 • 25
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction Paper • 2410.21169 • Published Oct 28 • 30
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published Oct 21 • 43
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning Paper • 2410.06456 • Published Oct 9 • 35
view article Article Deploying Your FastAPI Applications on Huggingface Via Docker By HemanthSai7 • Dec 11, 2023 • 18
view article Article Hosting your Models and Datasets on Hugging Face Spaces using Streamlit Oct 5, 2021 • 3
ColPali Paper Resources Collection Main resources for the paper: "ColPali: Efficient Document Retrieval with Vision Language Models" • 4 items • Updated 22 days ago • 6
view article Article Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face By andreagagliano • Sep 6 • 16
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 75