Rui Qian's picture

6 1

Rui Qian

shvdi

AI & ML interests

None yet

Recent Activity

upvoted a paper 23 days ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

View all activity

Organizations

None yet

shvdi's activity

upvoted a paper 23 days ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 23 days ago • 92

upvoted 2 papers 2 months ago

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 45

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 66

authored 5 papers 6 months ago

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

Paper • 2308.09951 • Published Aug 19, 2023

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

Paper • 2308.04549 • Published Aug 8, 2023

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation

Paper • 2402.17645 • Published Feb 27, 2024 • 1

Streaming Long Video Understanding with Large Language Models

Paper • 2405.16009 • Published May 25, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 93

upvoted a paper 6 months ago

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 93

liked a model 6 months ago

internlm/internlm-xcomposer2d5-7b

Visual Question Answering • Updated Jul 22, 2024 • 109k • 185

upvoted a paper 7 months ago

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17, 2024 • 61

upvoted a paper 11 months ago

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model

Paper • 2401.16420 • Published Jan 29, 2024 • 55