Merve Noyan's picture

Merve Noyan

merve

·

https://github.com/merveenoyan/smol-vision

AI & ML interests

VLMs, vision & co

Recent Activity

posted an update 3 days ago

supercharge your LLM apps with smolagents 🔥 however cool your LLM is, without being agentic it can only go so far enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff! Here's our blog for you to get started https://huggingface.co/blog/smolagents

upvoted a collection 3 days ago

posted an update 10 days ago

QwQ can see 🔥 Qwen team released QvQ, a large vision LM with reasoning 😱 it outperforms proprietary VLMs on several benchmarks, comes with open weights and a demo! Check them out ⬇️ Demo https://huggingface.co/spaces/Qwen/QVQ-72B-preview Model https://huggingface.co/Qwen/QVQ-72B-Preview Read more https://qwenlm.github.io/blog/qvq-72b-preview/ Congratulations @JustinLin610 and team!

View all activity

Articles

Introducing smolagents: simple agents that write actions in code.

Welcome PaliGemma 2 – New vision language models by Google

SmolVLM - small yet mighty Vision Language Model

Llama can now see and run on your device - welcome Llama 3.2

Preference Optimization for Vision Language Models

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Showcase Your Projects in Spaces using Gradio

Organizations

merve's activity

upvoted a collection 3 days ago

QVQ

QVQ: Qwen models for visual reasoning • 7 items • Updated 2 days ago • 33

upvoted a paper 16 days ago

Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published 24 days ago • 25

upvoted a paper 17 days ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 21 days ago • 135

upvoted a paper 29 days ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 30 days ago • 119

upvoted 4 papers 3 months ago

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 25

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 48

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

Paper • 2405.02296 • Published Mar 7, 2024 • 4

NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

Paper • 2404.01300 • Published Apr 1, 2024 • 4

upvoted an article 3 months ago

Article

Document Similarity Search with ColPali

By

•

Sep 21, 2024

• 48

upvoted 3 papers 4 months ago

DriveLM: Driving with Graph Visual Question Answering

Paper • 2312.14150 • Published Dec 21, 2023 • 4

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 58

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 51

upvoted a collection 5 months ago

InternVideo2

InternVideo2 • 15 items • Updated 6 days ago • 17

upvoted 5 papers 5 months ago

KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23, 2024 • 42

Meltemi: The first open Large Language Model for Greek

Paper • 2407.20743 • Published Jul 30, 2024 • 67

Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey

Paper • 2407.21794 • Published Jul 31, 2024 • 5

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 109

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 75

upvoted a collection 5 months ago

SpaceVLMs

Features VLMs fine-tuned for enhanced spatial reasoning using a synthetic data pipeline similar to Spatial VLM. • 3 items • Updated Jul 26, 2024 • 1

upvoted a paper 5 months ago

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Paper • 2407.12594 • Published Jul 17, 2024 • 19