5 30 87

Umitcan Sahin PRO

ucsahin

AI & ML interests

Visual Language Models, Large Language Models, Vision Transformers

Recent Activity

reacted to davanstrien's post with 👍 3 days ago

Updated the ColPali Query Generator Space https://huggingface.co/spaces/davanstrien/ColPali-Query-Generator to use https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct. Given an input image, it generates several queries along with explanations to justify them. This approach can generate synthetic data for fine-tuning ColPali models.

liked a model 6 days ago

Qwen/Qwen2.5-VL-72B-Instruct

liked a model 6 days ago

Qwen/Qwen2.5-VL-7B-Instruct

View all activity

Organizations

None yet

ucsahin's activity

reacted to davanstrien's post with 👍 3 days ago

Post

1720

Updated the ColPali Query Generator Space davanstrien/ColPali-Query-Generator to use Qwen/Qwen2.5-VL-7B-Instruct.

Given an input image, it generates several queries along with explanations to justify them. This approach can generate synthetic data for fine-tuning ColPali models.

liked 3 models 6 days ago

reacted to merve's post with 🔥 8 days ago

Post

4616

Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images

7 replies

liked a model 12 days ago

jinaai/ReaderLM-v2

Text Generation • Updated 12 days ago • 22k • 461

upvoted a collection 13 days ago

DeepSeek R1 (All Versions)

Collection

DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated about 1 hour ago • 134

liked a model 13 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 2 days ago • 845k • • 6.06k

liked a model 14 days ago

MiniMaxAI/MiniMax-Text-01

Text Generation • Updated 17 days ago • 6.24k • 494

reacted to kadirnar's post with 🚀🔥 14 days ago

Post

2675

I created my own AI image and video from scratch using the fal.ai platform 💫

Workflow: Flux Lora Training + Upscale + Kling AI(1.6)

5 replies

liked a model 15 days ago

openbmb/MiniCPM-o-2_6

Any-to-Any • Updated 7 days ago • 242k • 900

liked a model 17 days ago

Metin/Gemma-2-2B-TR-Knowledge-Graph

Text Generation • Updated 17 days ago • 386 • 12

reacted to fdaudens's post with 🚀🚀 19 days ago

Post

2309

🔥 The AI Agent hype is real! This blog post deep dives into everything you need to know before deploying them: from key definitions to practical recommendations. A must-read for anyone building the future of autonomous systems.

📊 Key insight: A clear table breaking down the 5 levels of AI agents - from simple processors to fully autonomous systems. Essential framework for understanding where your agent stands on the autonomy spectrum

⚖️ Deep analysis of 15 core values reveals critical trade-offs: accuracy, privacy, safety, equity & more. The same features that make agents powerful can make them risky. Understanding these trade-offs is crucial for responsible deployment

🎯 6 key recommendations for the road ahead:
- Create rigorous evaluation protocols
- Study societal effects
- Understand ripple effects
- Improve transparency
- Open source can make a positive difference
- Monitor base model evolution

Read the blog post: https://huggingface.co/blog/ethics-soc-7 Brillant work by @meg @evijit @sasha @giadap

liked a model 24 days ago

ByteDance/Sa2VA-4B

Image-Text-to-Text • Updated 19 days ago • 4.13k • 58

liked 3 models 25 days ago

microsoft/phi-4

Text Generation • Updated 25 days ago • 345k • 1.64k

ByteDance/Sa2VA-8B

Image-Text-to-Text • Updated 19 days ago • 4.34k • 44

99eren99/ModernBERT-base-Turkish-uncased-mlm

Fill-Mask • Updated 25 days ago • 97 • 4

reacted to m-ric's post with 🤗 26 days ago

Post

5086

Since I published it on GitHub a few days ago,
Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯

➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!

The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.

We will make it work better, and fully open. ✨

Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/

3 replies