3 18 3

Baifeng Shi

bfshi

https://bfshi.github.io

AI & ML interests

computer vision

Recent Activity

new activity 14 days ago

Efficient-Large-Model/NVILA-8B-Video:What is the difference between the nvila 8b base model and video model?

upvoted a paper 19 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

upvoted a paper about 1 month ago

An Empirical Study of Autoregressive Pre-training from Videos

View all activity

Organizations

bfshi's activity

upvoted a paper 19 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 20 days ago • 106

upvoted a paper about 1 month ago

An Empirical Study of Autoregressive Pre-training from Videos

Paper • 2501.05453 • Published Jan 9 • 37

upvoted a collection 2 months ago

NVILA

Collection

7 items • Updated Dec 31, 2024 • 9

upvoted a paper 2 months ago

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 58

upvoted 2 papers 4 months ago

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29, 2024 • 10

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published Oct 21, 2024 • 67

upvoted a paper 5 months ago

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Paper • 2410.01680 • Published Oct 2, 2024 • 34

upvoted 3 papers 6 months ago

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Paper • 2408.13257 • Published Aug 23, 2024 • 26

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 125

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51

upvoted 3 papers 7 months ago

upvoted a paper 8 months ago

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 37

upvoted a paper 11 months ago

When Do We Not Need Larger Vision Models?

Paper • 2403.13043 • Published Mar 19, 2024 • 25

upvoted a paper 12 months ago

Humanoid Locomotion as Next Token Prediction

Paper • 2402.19469 • Published Feb 29, 2024 • 27

upvoted a paper about 1 year ago

Rethinking Patch Dependence for Masked Autoencoders

Paper • 2401.14391 • Published Jan 25, 2024 • 25