Classifier architecture
The classifier uses DenseNet161 as the encoder and some linear layers at classifier base.
Model accuracy:
Model achieves 91.3% accuracy on the validation set.
F1-score per class: {'digital': 0.9873773235685747, 'hard': 0.9338602782753218, 'soft': 0.8444277483052108}
Mean F1-score: 0.9218884500497024
Accuracy: 0.913
Training dataset metadata:
- Dataset classes: ['soft', 'digital', 'hard']
- Number of classes: 3
- Total number of images: 18415
Number of images per class:
- soft : 5482
- digital : 1206
- hard : 11727
Classes description:
The hard class denotes a group of scenes to which a coarser background removal method should be applied, intended for objects with an edge without small details. The hard class contains the following categories of objects: object, laptop, charger, pc mouse, pc, rocks, table, bed, box, sneakers, ship, wire, guitar, fork, spoon, plate, keyboard, car, bus, screwdriver, ball, door, flower, clocks, fruit , food, robot.
The soft class denotes a group of scenes to which you want to apply a soft background removal method intended for people, hair, clothes, and other similar types of objects. The soft class contains the following categories of objects: animal, people, human, man, woman, t-shirt, hairs, hair, dog, cat, monkey, cow, medusa, clothes
The digital class denotes a group of images with digital graphics, such as screenshots, logos, and so on. The digital class contains the following categories of scenes: screenshot