VideoForMed - a che111 Collection

che111 's Collections

Work for 3D Medical Vision

Med Multimodal Learning

Localize Viusal Understanding

Generative Model

Synthetic Data Learning

Explaniable, Fairness Work

General Multimodal Learning

VideoForMed

updated Sep 5, 2024

Distilling Vision-Language Models on Millions of Videos

Paper • 2401.06129 • Published Jan 11, 2024 • 17
Koala: Key frame-conditioned long video-LLM

Paper • 2404.04346 • Published Apr 5, 2024 • 6
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8, 2024 • 21
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

Paper • 2406.07471 • Published Jun 11, 2024 • 1
VISA: Reasoning Video Object Segmentation via Large Language Models

Paper • 2407.11325 • Published Jul 16, 2024 • 1
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22, 2024 • 40
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Paper • 2409.01071 • Published Sep 2, 2024 • 27
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

Paper • 2409.01199 • Published Sep 2, 2024 • 14