laion-finetuned_room luxury annotater

This model is a fine-tuned version of laion/CLIP-ViT-B-32-laion2B-s34B-b79K on a private dataset provided by Wahi Inc. It is designed to classify room images into categories based on their luxury level and room type.

Model Description

This model leverages a fine-tuned version of CLIP, specifically optimized for real estate image annotation. It performs zero-shot classification of room images into categories like standard or contemporary kitchens, bathrooms, and other common rooms in real estate properties. The model uses a multi-stage approach where diffusion models generate supplementary training data, and hierarchical CLIP networks perform luxury annotation. This fine-tuning process enables high accuracy in distinguishing luxury levels from real estate images.

The model was developed for the paper "Diffusion-based Data Augmentation and Hierarchical CLIP for Real Estate Image Annotation" submitted to the Pattern Analysis and Applications Special Issue on Multimedia Sensing and Computing.

Intended Uses & Limitations

This model is intended to be used for:

Annotating real estate images by classifying room types and luxury levels (e.g., standard or contemporary kitchens, bathrooms, etc.).
Helping users filter properties in real estate platforms based on the luxury level of rooms.

Limitations:

The model is optimized for real estate images and may not generalize well to other domains.
Zero-shot classification is limited to the predefined categories and candidate labels used during fine-tuning.

Training and Evaluation Data

The training data was collected and labeled by Wahi Inc. and includes a diverse set of real estate images from kitchens, bathrooms, dining rooms, living rooms, and foyers. The images were annotated as either standard or contemporary, based on the room's aesthetics, design, and quality.

Training Procedure

Training Hyperparameters

The following hyperparameters were used during training:

Learning Rate: 1e-06
Train Batch Size: 384
Eval Batch Size: 24
Seed: 42
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
LR Scheduler Type: Linear

Framework Versions

Transformers: 4.37.2
PyTorch: 2.0.1+cu117
Datasets: 2.14.4
Tokenizers: 0.15.0

Output Example

Below is an example of the model's output, where an image of a kitchen is classified with its top 3 predicted room types and confidence scores.

How to Use the Model

You can use this model for zero-shot image classification with the HuggingFace pipeline API. Here is a basic example:

from transformers import pipeline

# Initialize the pipeline
classifier = pipeline("zero-shot-image-classification", model="strollingorange/roomLuxuryAnnotater")

# Define the candidate labels
candidate_labels = [
    "a photo of standard bathroom",
    "a photo of contemporary bathroom",
    "a photo of standard kitchen",
    "a photo of contemporary kitchen",
    "a photo of standard foyer",
    "a photo of standard living room",
    "a photo of standard dining room",
    "a photo of contemporary foyer",
    "a photo of contemporary living room",
    "a photo of contemporary dining room"
]

# Load your image (replace 'image_path' with your actual image path)
image = Image.open('path_to_your_image.jpg')

# Run zero-shot classification
result = classifier(image, candidate_labels=candidate_labels)

# Output the result
print(result)

Acknowledgments

We would like to acknowledge Wahi Inc. https://wahi.com/ca/en/ for continued support in the development of this model. Their collaboration was essential in fine-tuning the model for real estate image annotation.

strollingorange
/

roomLuxuryAnnotater