ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions Paper • 2303.06594 • Published Mar 12, 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models Paper • 2304.10592 • Published Apr 20, 2023
StoryGPT-V: Large Language Models as Consistent Story Visualizers Paper • 2312.02252 • Published Dec 4, 2023 • 1
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations Paper • 2308.16349 • Published Aug 30, 2023
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4, 2024 • 26
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos Paper • 2407.12679 • Published Jul 17, 2024 • 8
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 26
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 26
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 26 • 2
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling Paper • 2408.03695 • Published Aug 7, 2024 • 13
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Paper • 2310.09478 • Published Oct 14, 2023 • 19