Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
maxiwΒ 
posted an update 25 days ago
Post
2003
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. πŸ’»

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")


Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

Thank you

Hi @maxiw , would you want to consider integrate our ShowUI?
a 2B model from Qwen2-VL-2B, but with strong UI grounding and navigation :)

Β·

Hi @KevinQHLin , I integrated ShowUI in the latest release. Really cool model!