metadata

title: Object Detection in Live YouTube Streams
emoji: 🎥
colorFrom: blue
colorTo: green
sdk: gradio
python_version: '3.10'
sdk_version: 4.44.0
app_file: app.py
tags:
  - live-video
  - object-detection
  - YOLO
  - YouTube
  - gradio
models:
  - yolo11n
datasets:
  - HuggingFaceM4/COCO

Object Detection in Live YouTube Streams

Installation

To use and install this project, follow these steps:

Clone the repository from GitHub:

git clone https://huggingface.co/spaces/aai521-group6/youtube-object-detection

Navigate to the project directory:
```
cd youtube-object-detection
```
Ensure Python 3.8 or higher is installed on your machine.
Install required dependencies using:
```
pip install -r requirements.txt
```
Run the application:
```
python app.py
```

Objective

The primary goal of this project is to harness computer vision and machine learning technologies for real-time, accurate object detection in live YouTube streams. By focusing on this, we aim to unlock new potential in areas critical to modern society, such as enhanced security surveillance, efficient urban management, and advanced traffic analysis systems. Our objective is to develop a robust system that not only identifies and classifies objects in diverse streaming environments but also adapts to varying conditions with high precision and reliability.

Methods Used

Computer Vision and Object Detection: Implemented advanced computer vision techniques and object detection models to identify and classify objects in live video feeds.
Machine Learning and Deep Learning: Leveraged machine learning, especially deep learning, to interpret complex visual data from video streams.
Asynchronous Processing: Integrated asynchronous processing to improve the performance and responsiveness of the application.
Data Streaming: Employed efficient data streaming methods to handle live video feeds from online sources.
User Interface Design: Designed an enhanced, user-friendly interface with Gradio, enabling simple interaction with the system, including video input and result visualization.
API Integration for Video Retrieval: Utilized API solutions, such as youtube-search-python and Streamlink, to retrieve live video content from popular online platforms.

Technologies

Python: Primary programming language for the project's development.
Git: Version Control System for tracking and managing changes in the codebase.
GitHub: Platform for code hosting, collaboration, and version control.
YouTube API and Libraries: Data source for accessing live YouTube streams, using libraries like youtube-search-python.
Ultralytics YOLOv8: Object detection model for real-time video analysis.
OpenCV: Library for image and video processing tasks.
Gradio: Framework for building an interactive and user-friendly interface.
Streamlink: Tool for extracting live stream URLs.
Imageio: Used for reading frames from live streams using FFmpeg.
Asyncio: Enables asynchronous processing to improve application performance.

Project Description

This project is centered around the creation and deployment of a sophisticated object detection system, specifically tailored for live YouTube streams. Utilizing the advanced capabilities of the Ultralytics YOLOv8 model, this system is designed to identify, classify, and track objects in real-time within dynamic streaming environments.

Recent updates to the project include:

Comprehensive Code Refactoring: Improved efficiency and maintainability of the codebase by restructuring and optimizing code.
Maintained Docstring Style: Ensured consistent and detailed documentation throughout the code for better readability and understanding.
Enhanced User Interface and Experience: The user interface has undergone a significant makeover using Gradio, offering a more intuitive and engaging experience with modern theming, improved layout, and clear instructions.
Asynchronous Processing: Implemented asynchronous functions where applicable using asyncio, enhancing the performance and responsiveness of the application, especially during network operations and long-running tasks.

A key aspect of our endeavor is to address and overcome challenges associated with variable lighting conditions, object occlusions, and diverse environmental settings, ensuring the system's effectiveness and accuracy in real-world applications. Moreover, we aim to optimize the system for speed and efficiency, ensuring minimal latency in real-time processing.

The project not only represents a significant advancement in computer vision but also offers a versatile tool with wide-ranging applications, from urban planning and public safety to traffic management and surveillance.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Professor Roozbeh Sadeghian: Our advisor, for invaluable guidance and mentorship.
Professor Ebrahim Tarshizi: The Academic Director for the Applied Artificial Intelligence (AAI) program, for contributions to program structure and academic enrichment.
The Applied Artificial Intelligence Program at the University of San Diego: For essential support and resources.