Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos Paper • 2303.08345 • Published Mar 15, 2023
StyleBooth: Image Style Editing with Multimodal Instruction Paper • 2404.12154 • Published Apr 18
VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval Paper • 2211.12764 • Published Nov 23, 2022
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer Paper • 2410.00086 • Published Sep 30 • 10
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer Paper • 2410.00086 • Published Sep 30 • 10