cv - a zzfive Collection

zzfive 's Collections

Reinforcement learning

medical

3d

image

LLMs

video

agent

cv

audio

robot

cv

updated 11 days ago

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14 • 7
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Paper • 2403.09394 • Published Mar 14 • 25
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16 • 26
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Paper • 2406.20076 • Published Jun 28 • 8
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Paper • 2404.00412 • Published Mar 30 • 2
LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels

Paper • 2407.18054 • Published Jul 25 • 10
Matting by Generation

Paper • 2407.21017 • Published Jul 30 • 22
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 109
NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices

Paper • 2408.10161 • Published Aug 19 • 13
Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22 • 89
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published Sep 3 • 35
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 83
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08513 • Published Sep 13 • 11
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 28
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

Paper • 2409.17058 • Published Sep 25 • 11
Self-Supervised Any-Point Tracking by Contrastive Random Walks

Paper • 2409.16288 • Published Sep 24 • 5
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published Sep 26 • 32
MinerU: An Open-Source Solution for Precise Document Content Extraction

Paper • 2409.18839 • Published Sep 27 • 26
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Paper • 2410.02073 • Published Oct 2 • 40
Towards Natural Image Matting in the Wild via Real-Scenario Prior

Paper • 2410.06593 • Published Oct 9 • 2
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21 • 65
SMITE: Segment Me In TimE

Paper • 2410.18538 • Published Oct 24 • 15
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation

Paper • 2410.20474 • Published Oct 27 • 14
DELTA: Dense Efficient Long-range 3D Tracking for any video

Paper • 2410.24211 • Published Oct 31 • 8
Face Anonymization Made Simple

Paper • 2411.00762 • Published Nov 1 • 7
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements

Paper • 2411.12044 • Published Nov 18 • 13
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

Paper • 2411.10161 • Published Nov 15 • 8
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

Paper • 2411.11922 • Published Nov 18 • 18
DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

Paper • 2411.14347 • Published Nov 21 • 13
Knowledge Transfer Across Modalities with Natural Language Supervision

Paper • 2411.15611 • Published 29 days ago • 15
Edge Weight Prediction For Category-Agnostic Pose Estimation

Paper • 2411.16665 • Published 27 days ago • 4
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality

Paper • 2411.15241 • Published Nov 22 • 5
Scaling Image Tokenizers with Grouped Spherical Quantization

Paper • 2412.02632 • Published 19 days ago • 10
EMOv2: Pushing 5M Vision Model Frontier

Paper • 2412.06674 • Published 14 days ago • 13