Dominik Klotz PRO

programmnix-askui

AI & ML interests

Working on our vision of a better digital world. Prompt-to-Automation. VGQ, Object Detection, Text Detection, Icon Classification, ... Vision โค๏ธ

Recent Activity

Organizations

AskUI's profile picture

programmnix-askui's activity

New activity in AskUI/PTA-1 18 days ago

Welcome to try DeepSeek-VL2~

9
#2 opened about 1 month ago by
CharlesCXK
New activity in deepseek-ai/deepseek-vl2 24 days ago
New activity in AskUI/PTA-1 about 1 month ago

Several icons

1
#2 opened about 1 month ago by
darkzbaron

Update app.py

1
#1 opened 3 months ago by
Tonic
reacted to merve's post with ๐Ÿ”ฅ about 2 months ago
view post
Post
3934
Small yet mighty! ๐Ÿ’ซ

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient ๐Ÿค 

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO ๐Ÿ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO ๐Ÿ’—
reacted to maxiw's post with ๐Ÿš€ 3 months ago
view post
Post
2176
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. ๐Ÿ’ป

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")


Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B
ยท
reacted to maxiw's post with ๐Ÿš€ 3 months ago
view post
Post
1274
๐Ÿค– Controlling Computers with Small Models ๐Ÿค–

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1
  • 1 reply
ยท
reacted to maxiw's post with ๐Ÿ‘ 3 months ago
view post
Post
1727
Exciting to see open-source models thriving in the computer agent space! ๐Ÿ”ฅ
I just built a demo for OS-ATLAS: A Foundation Action Model For Generalist GUI Agents โ€” check it out here: maxiw/OS-ATLAS

This demo predicts bounding boxes based on screenshot + instructions as input.
reacted to maxiw's post with โค๏ธ 3 months ago
view post
Post
4649
I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
โค๏ธ: 7048 times
๐Ÿ”ฅ: 5921 times
๐Ÿ‘: 4856 times
๐Ÿš€: 2549 times
๐Ÿค—: 2065 times
ยท
New activity in maxiw/Qwen2-VL-Detection 4 months ago

Auto Detect

3
#4 opened 4 months ago by
iiBLACKii