Awesome Computer Use Agents - a ranpox Collection

ranpox 's Collections

LayoutLM and Document Intelligence

Awesome Computer Use Agents

Awesome Computer Use Agents

updated 9 days ago

https://github.com/ranpox/awesome-computer-use

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published 10 days ago • 43
Tree Search for Language Model Agents

Paper • 2407.01476 • Published Jul 1
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Paper • 2401.10935 • Published Jan 17 • 4
OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1 • 24
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 46
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Paper • 2410.05243 • Published Oct 7 • 17
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30 • 46
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Paper • 2410.18967 • Published Oct 24 • 1
Adversarial Attacks on Multimodal Agents

Paper • 2406.12814 • Published Jun 18 • 4
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

Paper • 2409.11295 • Published Sep 17
Attacking Vision-Language Computer Agents via Pop-ups

Paper • 2411.02391 • Published Nov 4
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Paper • 2407.10956 • Published Jul 15 • 6
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12 • 43
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Paper • 2407.01511 • Published Jul 1
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

Paper • 2405.14573 • Published May 23
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

Paper • 2409.15637 • Published Sep 24
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

Paper • 2406.10819 • Published Jun 16
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

Paper • 2402.07456 • Published Feb 12 • 41
NNetscape Navigator: Complex Demonstrations for Web Agents Without a Demonstrator

Paper • 2410.02907 • Published Oct 3
Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

Paper • 2403.03186 • Published Mar 5 • 5
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published Oct 24 • 30
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning

Paper • 2410.18963 • Published Oct 24
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Paper • 2408.07199 • Published Aug 13 • 20
Agent S: An Open Agentic Framework that Uses Computers Like a Human

Paper • 2410.08164 • Published Oct 10 • 24