# RTDETR Model on COCO8 Dataset

This model is a **Vision Transformer** (ViT) based object detection and tracking model, trained on the **COCO8** dataset.

## Model Details

- **Model Type**: RTDETR (a Vision Transformer based object detection and tracking model)
- **Trained On**: COCO8 dataset (people with and without coats)
- **Training Epochs**: 100 epochs
- **Input Size**: 640x640 pixels
- **Output**: Detects and tracks objects through the frames in any input video

## How to Use

You can use this model directly from the Hugging Face Hub. Below is an example of how to use it for inference on your images.