- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.
- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !
😇 TL;DR: OS-Atlas offers: 1. State-of-the-Art GUI Grounding: Helps GUI agents accurately locate GUI elements. 2. Strong OOD Performance and Cross-platform Compatibility: Excels in out-of-domain agentic tasks across MacOS, Windows, Linux, Android, and Web. 3. Complete Infrastructure for GUI Data Synthesis: You can easily build your own OS agent upon it!