MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17 • 31
MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17 • 31
MobA: A Two-Level Agent System for Efficient Mobile Task Automation Paper • 2410.13757 • Published Oct 17 • 31
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback Paper • 2403.18349 • Published Mar 27
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI Paper • 2205.11029 • Published May 23, 2022
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding Paper • 2402.18262 • Published Feb 28
MULTI: Multimodal Understanding Leaderboard with Text and Images Paper • 2402.03173 • Published Feb 5 • 3
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding Paper • 2402.18262 • Published Feb 28
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published Jul 15 • 6
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 46
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 46
Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era Paper • 2305.08144 • Published May 14, 2023
CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset Paper • 2305.15891 • Published May 25, 2023