Model Overview
A Keras model implementing the RetinaNet meta-architecture.
Implements the RetinaNet architecture for object detection. The constructor
requires num_classes
, bounding_box_format
, and a backbone. Optionally,
a custom label encoder, and prediction decoder may be provided.
Arguments
- num_classes: the number of classes in your dataset excluding the background class. Classes should be represented by integers in the range [0, num_classes).
- bounding_box_format: The format of bounding boxes of input dataset. Refer to the keras.io docs for more details on supported bounding box formats.
- backbone:
keras.Model
. If the defaultfeature_pyramid
is used, must implement thepyramid_level_inputs
property with keys "P3", "P4", and "P5" and layer names as values. A somewhat sensible backbone to use in many cases is the:keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")
- anchor_generator: (Optional) a
keras_cv.layers.AnchorGenerator
. If provided, the anchor generator will be passed to both thelabel_encoder
and theprediction_decoder
. Only to be used when bothlabel_encoder
andprediction_decoder
are bothNone
. Defaults to an anchor generator with the parameterization:strides=[2**i for i in range(3, 8)]
,scales=[2**x for x in [0, 1 / 3, 2 / 3]]
,sizes=[32.0, 64.0, 128.0, 256.0, 512.0]
, andaspect_ratios=[0.5, 1.0, 2.0]
. - label_encoder: (Optional) a keras.Layer that accepts an image Tensor, a
bounding box Tensor and a bounding box class Tensor to its
call()
method, and returns RetinaNet training targets. By default, a KerasCV standardRetinaNetLabelEncoder
is created and used. Results of this object'scall()
method are passed to theloss
object forbox_loss
andclassification_loss
they_true
argument. - prediction_decoder: (Optional) A
keras.layers.Layer
that is responsible for transforming RetinaNet predictions into usable bounding box Tensors. If not provided, a default is provided. The defaultprediction_decoder
layer is akeras_cv.layers.MultiClassNonMaxSuppression
layer, which uses a Non-Max Suppression for box pruning. - feature_pyramid: (Optional) A
keras.layers.Layer
that produces a list of 4D feature maps (batch dimension included) when called on the pyramid-level outputs of thebackbone
. If not provided, the reference implementation from the paper will be used. - classification_head: (Optional) A
keras.Layer
that performs classification of the bounding boxes. If not provided, a simple ConvNet with 3 layers will be used. - box_head: (Optional) A
keras.Layer
that performs regression of the bounding boxes. If not provided, a simple ConvNet with 3 layers will be used.
Example Usage
Pretrained RetinaNet model
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
"retinanet_resnet50_fpn_coco"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
Fine-tune the pre-trained model
backbone = keras_hub.models.Backbone.from_preset(
"retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
"retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
Custom training the model
image_converter = keras_hub.layers.RetinaNetImageConverter(
scale=1/255
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
image_converter=image_converter
)
# Load a pre-trained ResNet50 model.
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
"resnet_50_imagenet"
)
# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
image_encoder=image_encoder,
min_level=3,
max_level=5,
use_p5=False
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
Example Usage with Hugging Face URI
Pretrained RetinaNet model
object_detector = keras_hub.models.ImageObjectDetector.from_preset(
"hf://keras/retinanet_resnet50_fpn_coco"
)
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)
Fine-tune the pre-trained model
backbone = keras_hub.models.Backbone.from_preset(
"hf://keras/retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
"hf://keras/retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
Custom training the model
image_converter = keras_hub.layers.RetinaNetImageConverter(
scale=1/255
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
image_converter=image_converter
)
# Load a pre-trained ResNet50 model.
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
"resnet_50_imagenet"
)
# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
image_encoder=image_encoder,
min_level=3,
max_level=5,
use_p5=False
)
model = RetinaNetObjectDetector(
backbone=backbone,
num_classes=len(CLASSES),
preprocessor=preprocessor
)
- Downloads last month
- 5