AskUI

Enterprise

company

Verified

https://www.askui.com/

ask_ui

askui

Activity Feed

AI & ML interests

UI Automation, Agents, Vision, NLP, LLMs, prompt-to-automation, pta, pta-text, prompt to automation

AskUI's activity

maxiw

posted an update about 1 month ago

Post

2038

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

3 replies

maxiw

posted an update about 2 months ago

Post

1174

🤖 Controlling Computers with Small Models 🤖

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1

1 reply

maxiw

posted an update about 2 months ago

Post

4627

I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
❤️: 7048 times
🔥: 5921 times
👍: 4856 times
🚀: 2549 times
🤗: 2065 times

10 replies

maxiw

posted an update about 2 months ago

Post

1723

Exciting to see open-source models thriving in the computer agent space! 🔥
I just built a demo for OS-ATLAS: A Foundation Action Model For Generalist GUI Agents — check it out here: maxiw/OS-ATLAS

This demo predicts bounding boxes based on screenshot + instructions as input.

maxiw

posted an update 4 months ago

Post

2625

The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

4 replies

maxiw

posted an update 5 months ago

Post

2271

Just added the newly released xGen-MM v1.5 foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research to my xGen-MM HF Space maxiw/XGen-MM

2 replies

AI & ML interests

Team members 5

AskUI's activity