Maxim Kurkin's picture

4 45

Maxim Kurkin

dondosss

·

Fr0do

AI & ML interests

Multimodal AI, CV, NLP

Recent Activity

upvoted a paper 17 days ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

upvoted a paper 25 days ago

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

upvoted a paper about 2 months ago

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

View all activity

Organizations

dondosss's activity

upvoted a paper 17 days ago

PaliGemma 2: A Family of Versatile VLMs for Transfer

Paper • 2412.03555 • Published 19 days ago • 118

upvoted a paper 25 days ago

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Paper • 2411.18363 • Published 26 days ago • 9

upvoted 3 papers about 2 months ago

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

Paper • 2411.02327 • Published Nov 4 • 11

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4 • 35

Inference Optimal VLMs Need Only One Visual Token but Larger Models

Paper • 2411.03312 • Published Nov 5 • 6

upvoted 6 papers 4 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 124

Wavelets Are All You Need for Autoregressive Image Generation

Paper • 2406.19997 • Published Jun 28 • 29

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 55

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 98

Generative Photomontage

Paper • 2408.07116 • Published Aug 13 • 19

Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Paper • 2408.07060 • Published Aug 13 • 40

upvoted 2 papers 5 months ago

Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Paper • 2408.00874 • Published Aug 1 • 45

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

Paper • 2403.14520 • Published Mar 21 • 33

upvoted a collection 5 months ago

🍃 MINT-1T

Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 56

upvoted 3 papers 5 months ago

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

Paper • 2407.16198 • Published Jul 23 • 13

POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation

Paper • 2407.14931 • Published Jul 20 • 20

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 68

upvoted 3 papers 6 months ago

TabReD: A Benchmark of Tabular Machine Learning in-the-Wild

Paper • 2406.19380 • Published Jun 27 • 47

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14 • 76

Aligning Diffusion Models with Noise-Conditioned Perception

Paper • 2406.17636 • Published Jun 25 • 26