2 33 24

Xing Yun

xing0047

xing0047

AI & ML interests

Computer Vision

Recent Activity

liked a model about 5 hours ago

microsoft/Phi-4-multimodal-instruct

upvoted a paper 1 day ago

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

published a dataset 4 days ago

xing0047/davis

View all activity

Organizations

xing0047's activity

upvoted a paper 1 day ago

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published 3 days ago • 47

upvoted 5 papers 6 days ago

Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published 9 days ago • 47

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published 8 days ago • 25

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 8 days ago • 146

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 7 days ago • 116

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published 7 days ago • 91

upvoted a paper 10 days ago

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published 21 days ago • 33

upvoted 2 papers about 1 month ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 273

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published Jan 9 • 39

upvoted 11 papers about 2 months ago

EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 51

An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 37

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis

Paper • 2501.04561 • Published Jan 8 • 16

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Paper • 2501.04001 • Published Jan 7 • 43

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 69

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 40

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41