Edit model card

Model Overview

A Keras model implementing the RetinaNet meta-architecture.

Implements the RetinaNet architecture for object detection. The constructor requires num_classes, bounding_box_format, and a backbone. Optionally, a custom label encoder, and prediction decoder may be provided.

Arguments

  • num_classes: the number of classes in your dataset excluding the background class. Classes should be represented by integers in the range [0, num_classes).
  • bounding_box_format: The format of bounding boxes of input dataset. Refer to the keras.io docs for more details on supported bounding box formats.
  • backbone: keras.Model. If the default feature_pyramid is used, must implement the pyramid_level_inputs property with keys "P3", "P4", and "P5" and layer names as values. A somewhat sensible backbone to use in many cases is the: keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")
  • anchor_generator: (Optional) a keras_cv.layers.AnchorGenerator. If provided, the anchor generator will be passed to both the label_encoder and the prediction_decoder. Only to be used when both label_encoder and prediction_decoder are both None. Defaults to an anchor generator with the parameterization: strides=[2**i for i in range(3, 8)], scales=[2**x for x in [0, 1 / 3, 2 / 3]], sizes=[32.0, 64.0, 128.0, 256.0, 512.0], and aspect_ratios=[0.5, 1.0, 2.0].
  • label_encoder: (Optional) a keras.Layer that accepts an image Tensor, a bounding box Tensor and a bounding box class Tensor to its call() method, and returns RetinaNet training targets. By default, a KerasCV standard RetinaNetLabelEncoder is created and used. Results of this object's call() method are passed to the loss object for box_loss and classification_loss the y_true argument.
  • prediction_decoder: (Optional) A keras.layers.Layer that is responsible for transforming RetinaNet predictions into usable bounding box Tensors. If not provided, a default is provided. The default prediction_decoder layer is a keras_cv.layers.MultiClassNonMaxSuppression layer, which uses a Non-Max Suppression for box pruning.
  • feature_pyramid: (Optional) A keras.layers.Layer that produces a list of 4D feature maps (batch dimension included) when called on the pyramid-level outputs of the backbone. If not provided, the reference implementation from the paper will be used.
  • classification_head: (Optional) A keras.Layer that performs classification of the bounding boxes. If not provided, a simple ConvNet with 3 layers will be used.
  • box_head: (Optional) A keras.Layer that performs regression of the bounding boxes. If not provided, a simple ConvNet with 3 layers will be used.

Example Usage

Pretrained RetinaNet model

object_detector = keras_hub.models.ImageObjectDetector.from_preset(
    "retinanet_resnet50_fpn_coco"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)

Fine-tune the pre-trained model

backbone = keras_hub.models.Backbone.from_preset(
    "retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
    "retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)

Custom training the model

image_converter = keras_hub.layers.RetinaNetImageConverter(
    scale=1/255
)

preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
    image_converter=image_converter
)
# Load a pre-trained ResNet50 model. 
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
    "resnet_50_imagenet" 
)

# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
    image_encoder=image_encoder,
    min_level=3,
    max_level=5,
    use_p5=False 
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)

Example Usage with Hugging Face URI

Pretrained RetinaNet model

object_detector = keras_hub.models.ImageObjectDetector.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)

input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
object_detector(input_data)

Fine-tune the pre-trained model

backbone = keras_hub.models.Backbone.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)
preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor.from_preset(
    "hf://keras/retinanet_resnet50_fpn_coco"
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)

Custom training the model

image_converter = keras_hub.layers.RetinaNetImageConverter(
    scale=1/255
)

preprocessor = keras_hub.models.RetinaNetObjectDetectorPreprocessor(
    image_converter=image_converter
)
# Load a pre-trained ResNet50 model. 
# This will serve as the base for extracting image features.
image_encoder = keras_hub.models.Backbone.from_preset(
    "resnet_50_imagenet" 
)

# Build the RetinaNet Feature Pyramid Network (FPN) on top of the ResNet50 
# backbone. The FPN creates multi-scale feature maps for better object detection
# at different sizes.
backbone = keras_hub.models.RetinaNetBackbone(
    image_encoder=image_encoder,
    min_level=3,
    max_level=5,
    use_p5=False 
)
model = RetinaNetObjectDetector(
    backbone=backbone,
    num_classes=len(CLASSES),
    preprocessor=preprocessor
)
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .