UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity Paper โข 2409.04081 โข Published Sep 6 โข 3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale Paper โข 2409.08264 โข Published Sep 12 โข 43
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper โข 2409.01704 โข Published Sep 3 โข 83
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper โข 2408.08459 โข Published Aug 15 โข 45
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! โข 16 items โข Updated 1 day ago โข 35