--- datasets: - adhamelarabawy/fashion_human_classification language: - en pipeline_tag: image-classification --- # Human Presence Classification #### CLIP-Based Linear Probe Logistic Regression classification model to detect the presence of humans in fashion-domain images. @author: Adham Elarabawy (www.adhamelarabawy.com) ## Overview I needed a human presence classification model to help with structuring a very large scraped dataset of fashion imagery. CLIP-based similarity scoring was not sufficient, since desired precision would result in a substantial drop rate. I trained a logistic model on top of CLIP image features as a linear probe for classification, using DeepFashion paired images. Achieved 100% accuracy on the test set (20% = ~2k imgs). Definitely overfit to fashion imagery, but that's fine since that's the downstream use case. This is extremely low latency, especially if you've already encoded your images using ViT-B/32 CLIP variant. On an A10, it takes about ~23 milliseconds to encode the image, and ~0.28 milliseconds to classify the features. ## Dataset I used a subset of DeepFashion v1 in order to curate a dataset of paired images of a garment and then the garment on a person. I then used this structuring to create the final dataset with binary labels of human presence. Some notes: - The images seem to be predominantly women. - The human models seem to have good coverage on most ethnicities/body types. Early analysis also shows that there is not any ethnicity/body type bias. - Most/all the images have a white background. From my testing, the model generalizes quite well to other domains (with natural/diverse backgrounds/poses). - My hypothesis is that the paired nature of the data allowed the model to pick up on the correct features, which has made it very robust. |Presence Case|Absence Case| |---|---| ||| ## Usage: ```python import clip import torch import pickle import sklearn import time from PIL import Image from huggingface_hub import hf_hub_download device = "cuda" if torch.cuda.is_available() else "cpu" clip_model, clip_preprocess = clip.load("ViT-B/32", device) repo_id = "adhamelarabawy/fashion_human_classifier" model_path = hf_hub_download(repo_id=repo_id, filename="model.pkl") with open(model_path, 'rb') as file: human_classifier = pickle.load(file) # time the prediction start = time.time() features = clip_model.encode_image(clip_preprocess(img).unsqueeze(0).to(device)).detach().cpu().numpy() encode_time = time.time() - start pred = human_classifier.predict(features) # True = has human, False = no human pred_time = time.time() - encode_time - start print(f"Encode time: {encode_time*1000:.3f} milliseconds") print(f"Prediction time: {pred_time*1000:.3f} milliseconds") print(f"Prediction (has_human): {pred}") ```