Maxi's picture

Maxi PRO

maxiw

AI & ML interests

Computer Agents | VLMs

Organizations

AskUI's profile picture

maxiw's activity

reacted to MonsterMMORPG's post with 👀 12 days ago
view post
Post
2539
Simple prompt 2x latent upscaled FLUX - Fine Tuning / DreamBooth Images - Can be trained on as low as 6 GB GPUs - Each image 2048x2048 pixels

AI Photos of Yourself - Workflow Guide
Step 1: Initial Setup
Follow any standard FLUX Fine-Tuning / DreamBooth tutorial of your choice

You can also follow mine step by step : https://youtu.be/FvpWy1x5etM

Step 2: Data Collection
Gather high-quality photos of yourself

I used a Poco X6 Pro (mid-tier phone) with good results

Ensure good variety in poses and lighting

Step 3: Training
Use "ohwx man" as the only caption for all images

Keep it simple - no complex descriptions needed

Step 4: Testing & Optimization
Use SwarmUI grid to find the optimal checkpoint

Test different variations to find what works best

Step 5: Generation Settings
Upscale Parameters:

Scale: 2x

Refiner Control: 0.6

Model: RealESRGAN_x4plus.pth

Prompt Used:

photograph of ohwx man wearing an amazing ultra expensive suit on a luxury studio<segment:yolo-face_yolov9c.pt-1,0.7,0.5>photograph of ohwx man
Note: The model naturally generated smiling expressions since the training dataset included many smiling photos.

Note: yolo-face_yolov9c.pt used to mask face and auto inpaint face to improve distant shot face quality
replied to their post 23 days ago
reacted to merve's post with 👀 24 days ago
view post
Post
3834
Small yet mighty! 💫

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗
reacted to luigi12345's post with 👀 25 days ago
view post
Post
3709
MinimalScrap
Only Free Dependencies. Save it.It is quite useful uh.


!pip install googlesearch-python requests
from googlesearch import search
import requests
query = "Glaucoma"
for url in search(f"{query} site:nih.gov filetype:pdf", 20):
    if url.endswith(".pdf"):
        with open(url.split("/")[-1], "wb") as f: f.write(requests.get(url).content)
        print("✅" + url.split("/")[-1])
print("Done!")

posted an update 25 days ago
view post
Post
2003
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")


Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B
·
reacted to prithivMLmods's post with 🔥 26 days ago
view post
Post
3249
HF Posts Receipts 🏆🚀

[ HF POSTS RECEIPT ] : prithivMLmods/HF-POSTS-RECEIPT

🥠The one thing that needs to be remembered is the 'username'.

🥠And yeah, thank you, @maxiw , for creating the awesome dataset and sharing them here! 🙌

🥠[ Dataset ] : maxiw/hf-posts

.
.
.
@prithivMLmods
posted an update about 1 month ago
view post
Post
1141
🤖 Controlling Computers with Small Models 🤖

We just released PTA-1, a fine-tuned Florence-2 for localization of GUI text and elements. It runs with ~150ms inference time on a RTX 4080. This means you can now start building fast on-device computer use agents!

Model: AskUI/PTA-1
Demo: AskUI/PTA-1
  • 1 reply
·
reacted to rwightman's post with 🚀 about 1 month ago
view post
Post
1577
New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

reacted to m-ric's post with 👀 about 1 month ago
view post
Post
1382
Great feature alert: 𝗬𝗼𝘂 𝗰𝗮𝗻 𝗻𝗼𝘄 𝘂𝘀𝗲 𝗮𝗻𝘆 𝗦𝗽𝗮𝗰𝗲 𝗮𝘀 𝗮 𝘁𝗼𝗼𝗹 𝗳𝗼𝗿 𝘆𝗼𝘂𝗿 𝘁𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀.𝗮𝗴𝗲𝗻𝘁! 🛠️🔥🔥

This lets you take the coolest spaces, like FLUX.1-dev, and use them in agentic workflows with a few lines of code! 🧑‍💻

On the video below, I set up my fake vacation pictures where I'm awesome at surfing (I'm really not) 🏄

Head to the doc to learn this magic 👉 https://huggingface.co/docs/transformers/main/en/agents_advanced#import-a-space-as-a-tool-
reacted to merve's post with 👀 about 1 month ago
view post
Post
4978
OmniVision-968M: a new local VLM for edge devices, fast & small but performant
💨 a new vision language model with 9x less image tokens, super efficient
📖 aligned with DPO for reducing hallucinations
⚡️ Apache 2.0 license 🔥

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model https://huggingface.co/NexaAIDev/omnivision-968M
  • 4 replies
·
reacted to m-ric's post with ❤️ about 1 month ago
view post
Post
3767
𝗧𝗵𝗲 𝗻𝗲𝘅𝘁 𝗯𝗶𝗴 𝘀𝗼𝗰𝗶𝗮𝗹 𝗻𝗲𝘁𝘄𝗼𝗿𝗸 𝗶𝘀 𝗻𝗼𝘁 🦋, 𝗶𝘁'𝘀 𝗛𝘂𝗯 𝗣𝗼𝘀𝘁𝘀! [INSERT STONKS MEME WITH LASER EYES]

See below: I got 105k impressions since regularly posting Hub Posts, coming close to my 275k on Twitter!

⚙️ Computed with the great dataset maxiw/hf-posts
⚙️ Thanks to Qwen2.5-Coder-32B for showing me how to access dict attributes in a SQL request!

cc @merve who's far in front of me
·
reacted to merve's post with 😎 about 1 month ago
view post
Post
1664
Microsoft released LLM2CLIP: a CLIP model with longer context window for complex text inputs 🤯
All models with Apache 2.0 license here microsoft/llm2clip-672323a266173cfa40b32d4c

TLDR; they replaced CLIP's text encoder with various LLMs fine-tuned on captioning, better top-k accuracy on retrieval.
This will enable better image-text retrieval, better zero-shot image classification, better vision language models 🔥
Read the paper to learn more: LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation (2411.04997)
replied to their post about 1 month ago
view reply

I didn't know about the hub-stats dataset. Really cool, thx for sharing! 🤗

posted an update about 1 month ago
view post
Post
4617
I was curious to see what people post here on HF so I created a dataset with all HF Posts: maxiw/hf-posts

Some interesting stats:

Top 5 Authors by Total Impressions:
-----------------------------------
@merve : 171,783 impressions (68 posts)
@fdaudens : 135,253 impressions (81 posts)
@singhsidhukuldeep : 122,591 impressions (81 posts)
@akhaliq : 119,526 impressions (78 posts)
@MonsterMMORPG : 112,500 impressions (45 posts)

Top 5 Users by Number of Reactions Given:
----------------------------------------
@osanseviero : 1278 reactions
@clem : 910 reactions
@John6666 : 899 reactions
@victor : 674 reactions
@samusenps : 655 reactions

Top 5 Most Used Reactions:
-------------------------
❤️: 7048 times
🔥: 5921 times
👍: 4856 times
🚀: 2549 times
🤗: 2065 times
·
posted an update about 1 month ago
view post
Post
1722
Exciting to see open-source models thriving in the computer agent space! 🔥
I just built a demo for OS-ATLAS: A Foundation Action Model For Generalist GUI Agents — check it out here: maxiw/OS-ATLAS

This demo predicts bounding boxes based on screenshot + instructions as input.
reacted to cbensimon's post with ❤️ 3 months ago
view post
Post
4340
Hello everybody,

We've rolled out a major update to ZeroGPU! All the Spaces are now running on it.

Major improvements:

1. GPU cold starts about twice as fast!
2. RAM usage reduced by two-thirds, allowing more effective resource usage, meaning more GPUs for the community!
3. ZeroGPU initializations (coldstarts) can now be tracked and displayed (use progress=gr.Progress(track_tqdm=True))
4. Improved compatibility and PyTorch integration, increasing ZeroGPU compatible spaces without requiring any modifications!

Feel free to answer in the post if you have any questions

🤗 Best regards,
Charles
replied to m-ric's post 3 months ago
replied to their post 3 months ago
view reply

@fridayfairy this is not fine-tuned. It's the base model just prompted to return bounding boxes in a specific format. The Qwen2-VL models must have been pre-trained on detection data.

reacted to rwightman's post with 👍 4 months ago
view post
Post
1288
The timm leaderboard timm/leaderboard has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.

Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit ( timm/test_vit.r160_in1k)
* test_efficientnet ( timm/test_efficientnet.r160_in1k)
* test_byobnet ( timm/test_byobnet.r160_in1k, a mix of resnet, darknet, effnet/regnet like blocks)

They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
replied to their post 4 months ago