NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models Paper • 2407.12366 • Published Jul 17 • 4 • 2
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding Paper • 2406.19263 • Published Jun 27 • 9 • 2