Distilling Vision-Language Models on Millions of Videos Paper โข 2401.06129 โข Published Jan 11, 2024 โข 15