# RTDETR Model on COCO8 Dataset This model is a **Vision Transformer** (ViT) based object detection and tracking model, trained on the **COCO8** dataset. ## Model Details - **Model Type**: RTDETR (a Vision Transformer based object detection and tracking model) - **Trained On**: COCO8 dataset (people with and without coats) - **Training Epochs**: 100 epochs - **Input Size**: 640x640 pixels - **Output**: Detects and tracks objects through the frames in any input video ## How to Use You can use this model directly from the Hugging Face Hub. Below is an example of how to use it for inference on your images.