Stoney Kang

sikang99

AI & ML interests

Remote Control based on Vision

Recent Activity

upvoted a paper 2 days ago

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

upvoted a paper 2 days ago

YOLOE: Real-Time Seeing Anything

upvoted a paper 3 days ago

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

View all activity

Organizations

sikang99's activity

upvoted 2 papers 2 days ago

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM

Paper • 2503.04504 • Published 8 days ago • 2

YOLOE: Real-Time Seeing Anything

Paper • 2503.07465 • Published 4 days ago • 5

upvoted a paper 3 days ago

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

Paper • 2503.05085 • Published 7 days ago • 45

upvoted a paper 4 days ago

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published 7 days ago • 85

upvoted 2 papers 5 days ago

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

Paper • 2503.03983 • Published 8 days ago • 22

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 8 days ago • 77

upvoted a paper 9 days ago

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published 10 days ago • 72

upvoted a paper 15 days ago

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published 15 days ago • 43

upvoted a paper 24 days ago

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Paper • 2502.10248 • Published 28 days ago • 51

upvoted an article about 1 month ago

Article

We now support VLMs in smolagents!

Jan 24

• 92

upvoted 2 papers 6 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 141

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 76

upvoted a collection 6 months ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 227

upvoted an article 7 months ago

Article

Train Custom Models on Hugging Face Spaces with AutoTrain SpaceRunner

•

May 9, 2024

• 16

upvoted 2 papers 7 months ago

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

Paper • 2408.04810 • Published Aug 9, 2024 • 24

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 47