Dominik Klotz PRO

programmnix-askui

programminx-askui

AI & ML interests

Working on our vision of a better digital world. Prompt-to-Automation. VGQ, Object Detection, Text Detection, Icon Classification, ... Vision ❤️

Recent Activity

new activity 10 days ago

maxiw/Qwen2-VL-Detection:Have you finetuned the model?

new activity 12 days ago

AskUI/PTA-1:Dataset and hyperparameters for training

liked a Space about 2 months ago

HuggingFaceTB/SmolVLM-256M-Demo

View all activity

Organizations

programmnix-askui's activity

New activity in maxiw/Qwen2-VL-Detection 10 days ago

Have you finetuned the model?

#5 opened 10 days ago by

wytalfred

New activity in AskUI/PTA-1 12 days ago

Dataset and hyperparameters for training

#3 opened 12 days ago by

Maverick17

liked 3 Spaces about 2 months ago

SmolVLM

📊

Generate descriptions from images and text prompts

SmolVLM 256M Instruct WebGPU

🐨

Generate descriptions for images using WebGPU technology

SmolVLM 500M Instruct WebGPU

💻

New activity in AskUI/PTA-1 about 2 months ago

Welcome to try DeepSeek-VL2~

#2 opened 2 months ago by

CharlesCXK

updated a Space about 2 months ago

DeepSeek Vl UI

🦀

Generate bounding boxes and text for image objects

published a Space about 2 months ago

DeepSeek Vl UI

🦀

Generate bounding boxes and text for image objects

liked a Space about 2 months ago

OS ATLAS

📉

A Foundation Action Model For Generalist GUI Agents

New activity in deepseek-ai/deepseek-vl2 about 2 months ago

Request a demo huggingface space

#7 opened about 2 months ago by

programmnix-askui

New activity in AskUI/PTA-1 about 2 months ago

Welcome to try DeepSeek-VL2~

#2 opened 2 months ago by

CharlesCXK

updated a Space about 2 months ago

DeepSeek Vl UI

🦀

Generate bounding boxes and text for image objects

New activity in AskUI/PTA-1 2 months ago

Several icons

#2 opened 2 months ago by

darkzbaron

Update app.py

#1 opened 4 months ago by

Tonic

reacted to merve's post with 🔥 3 months ago

Post

3956

Small yet mighty! 💫

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗

liked a model 3 months ago

showlab/ShowUI-2B

Updated 3 days ago • 41.4k • 240

upvoted 3 collections 3 months ago

reacted to maxiw's post with 🚀 4 months ago

Post

2310

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

3 replies