Classifier architecture

The classifier uses DenseNet161 as the encoder and some linear layers at classifier base.

Model accuracy:

Model achieves 91.3% accuracy on the validation set.
F1-score per class: {'digital': 0.9873773235685747, 'hard': 0.9338602782753218, 'soft': 0.8444277483052108}
Mean F1-score: 0.9218884500497024
Accuracy: 0.913

Training dataset metadata:

Dataset classes: ['soft', 'digital', 'hard']
Number of classes: 3
Total number of images: 18415

Number of images per class:

soft : 5482
digital : 1206
hard : 11727

Classes description:

The hard class denotes a group of scenes to which a coarser background removal method should be applied, intended for objects with an edge without small details. The hard class contains the following categories of objects: object, laptop, charger, pc mouse, pc, rocks, table, bed, box, sneakers, ship, wire, guitar, fork, spoon, plate, keyboard, car, bus, screwdriver, ball, door, flower, clocks, fruit , food, robot.
The soft class denotes a group of scenes to which you want to apply a soft background removal method intended for people, hair, clothes, and other similar types of objects. The soft class contains the following categories of objects: animal, people, human, man, woman, t-shirt, hairs, hair, dog, cat, monkey, cow, medusa, clothes
The digital class denotes a group of images with digital graphics, such as screenshots, logos, and so on. The digital class contains the following categories of scenes: screenshot