InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Paper • 2305.06500 • Published May 11, 2023 • 4
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Paper • 2310.09199 • Published Oct 13, 2023 • 24
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Paper • 2306.05424 • Published Jun 8, 2023 • 7
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 80