Umitcan Sahin's picture

Umitcan Sahin PRO

ucsahin

·

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

reacted to davanstrien's post with 👍 9 days ago

Updated the ColPali Query Generator Space https://huggingface.co/spaces/davanstrien/ColPali-Query-Generator to use https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct. Given an input image, it generates several queries along with explanations to justify them. This approach can generate synthetic data for fine-tuning ColPali models.

liked a model 12 days ago

Qwen/Qwen2.5-VL-72B-Instruct

liked a model 12 days ago

Qwen/Qwen2.5-VL-7B-Instruct

View all activity

Organizations

None yet

ucsahin's activity

upvoted a collection 19 days ago

DeepSeek R1 (All Versions)

DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 1 day ago • 162

upvoted 2 collections about 1 month ago

DeepSeek-V3

3 items • Updated Jan 6 • 178

DeepSeek-VL2

5 items • Updated 4 days ago • 55

upvoted 2 collections 2 months ago

DataGemma Release

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Dec 13, 2024 • 84

Turkish Instruction Datasets

Collection of instruction datasets for Turkish. • 38 items • Updated Jan 1 • 3

upvoted 2 collections 3 months ago

SigLIP

Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 10 items • Updated Dec 13, 2024 • 50

Nov 15 Releases 🍂

15 items • Updated Nov 15, 2024 • 6

upvoted a collection 5 months ago

Turkish Vision-Language Datasets

Collection of Turkish vision-language datasets. • 22 items • Updated about 1 month ago • 5

upvoted 5 papers 6 months ago

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 60

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 61

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 47

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 113

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 76

upvoted a collection 6 months ago

Vision Language Leaderboards

This collection has all the vision language leaderboards. • 7 items • Updated Aug 24, 2024 • 18

upvoted an article 6 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31, 2024

• 58

upvoted an article 7 months ago

Article

The Rise of Agentic Data Generation

By

•

Jul 15, 2024

• 81

upvoted 2 papers 7 months ago

EVLM: An Efficient Vision-Language Model for Visual Understanding

Paper • 2407.14177 • Published Jul 19, 2024 • 43

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 43

upvoted a collection 7 months ago

🪐 SmolLM

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Dec 22, 2024 • 214

upvoted an article 7 months ago

Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Jul 18, 2024

• 56