6 43 22

Frank Sommers PRO

fsommers

fsommers

AI & ML interests

None yet

Recent Activity

liked a model 3 days ago

Qwen/QwQ-32B-Preview

liked a model 3 days ago

Qwen/QVQ-72B-Preview

liked a Space 3 days ago

Qwen/QVQ-72B-preview

View all activity

Articles

Document Similarity Search with ColPali

Sep 21

• 48

Organizations

fsommers's activity

upvoted a paper 4 days ago

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Paper • 2410.12628 • Published Oct 16 • 29

upvoted a paper 6 days ago

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 83

upvoted a collection 10 days ago

multilingual vision models

Collection

Some papers I read for understanding vision models and also adding multilingual capabilities to them • 14 items • Updated 19 days ago • 2

upvoted a paper 10 days ago

Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published 20 days ago • 25

upvoted a paper 23 days ago

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Paper • 2412.04424 • Published 25 days ago • 57

upvoted a paper 24 days ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 26 days ago • 119

upvoted a collection about 1 month ago

PathummaLLM-1.0.0

Collection

Multimodal LLM for Thai. • 3 items • Updated Oct 24 • 7

upvoted an article about 1 month ago

Article

Enjoy the Power of Phi-3 with ONNX Runtime on your device

•

May 22

• 25

upvoted an article 2 months ago

Article

Visually Multilingual: Introducing mcdse-2b

•

Oct 27

• 37

upvoted 3 papers 2 months ago

A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25 • 40

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published Oct 28 • 30

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published Oct 21 • 43

upvoted a paper 3 months ago

From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Paper • 2410.06456 • Published Oct 9 • 35

upvoted 3 articles 3 months ago

Article

Deploying Your FastAPI Applications on Huggingface Via Docker

•

Dec 11, 2023

• 18

Article

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Oct 5, 2021

• 3

Article

Llama can now see and run on your device - welcome Llama 3.2

Sep 25

• 180

upvoted a collection 3 months ago

ColPali Paper Resources

Collection

Main resources for the paper: "ColPali: Efficient Document Retrieval with Vision Language Models" • 4 items • Updated 22 days ago • 6

upvoted 2 articles 3 months ago

Article

Document Similarity Search with ColPali

•

Sep 21

• 48

Article

Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face

•

Sep 6

• 16

upvoted a paper 3 months ago

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18 • 75