Update README.md

cac3349 verified 2 months ago

4.24 kB

	---
	license: mit
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- gpt-omni/mini-omni
	tags:
	- code
	---
	# ATP Tennis Match Analysis and Anomaly Detection

	This project focuses on analyzing ATP tennis match data using a deep learning model with joint embedding techniques. The objective is to detect anomalies in professional men's tennis tournament draws using advanced statistical and machine learning methods. The project employs PyTorch for building and training the neural network, Optuna for hyperparameter optimization, and DBSCAN for anomaly detection.

	## Table of Contents

	- [Overview](#overview)
	- [Features](#features)
	- [Setup](#setup)
	- [Usage](#usage)
	- [Model Architecture](#model-architecture)
	- [Hyperparameter Optimization](#hyperparameter-optimization)
	- [Anomaly Detection](#anomaly-detection)
	- [Results](#results)
	- [Contributing](#contributing)
	- [License](#license)

	## Overview

	The project aims to identify irregularities in tennis matches by examining patterns and discrepancies in player rankings, ages, and other match-related features. This analysis can help detect potential biases or unusual outcomes in tournament draws.

	## Features

	- Data Loading and Preprocessing: Handles ATP match data from multiple years, with preprocessing steps including encoding categorical features and handling missing values.
	- Feature Engineering: Creates new features such as age difference and rank difference between players.
	- Joint Embedding Neural Network: A PyTorch-based model that combines categorical and numerical features for robust prediction of match outcomes.
	- Hyperparameter Tuning: Uses Optuna for efficient optimization of model hyperparameters.
	- Anomaly Detection: Applies DBSCAN clustering to the embeddings generated by the model to identify anomalies in player performance.

	## Setup

	### Prerequisites

	- Python 3.8 or later
	- PyTorch
	- Optuna
	- Scikit-learn
	- Matplotlib
	- Pandas
	- NumPy

	### Installation

	1. Clone the repository:

	```bash
	git clone https://github.com/yourusername/atp-tennis-analysis.git
	cd atp-tennis-analysis


	pip install -r requirements.txt

	Download the ATP match data files and place them in the project directory.
	Ensure the files are named in the format atp_matches_<year>.csv (e.g., atp_matches_2000.csv).

	Run the main script to load data, preprocess it, and train the model:
	python main.py

	## Model Training

	The script trains the model using the preprocessed data, optimizing hyperparameters with Optuna, and saves the best-performing model.

	## Anomaly Detection

	The model’s predictions are used to perform anomaly detection, identifying unusual matches or player performances.

	## View Results

	Results, including anomaly plots and metrics, will be saved in the output directory. CSV files summarizing the anomalies per player, year, and tournament will also be generated.

	## Model Architecture
	The JointEmbeddedModel consists of:

	Embeddings for Categorical Features:
	Each categorical variable (e.g., player IDs, tournament IDs) is embedded into a dense vector.

	Fully Connected Layers:
	These layers combine embeddings and numerical features to predict match outcomes.

	Dropout Layers:
	Used to prevent overfitting and improve model generalization.

	## Hyperparameter Optimization
	The project uses Optuna to automatically search for the best combination of model parameters, including:

	Embedding dimension

	Hidden layer size

	Learning rate

	Batch size

	Dropout rate

	## Anomaly Detection
	Anomalies are detected by comparing expected and actual rank differences in matches using DBSCAN clustering. Anomalies can indicate unexpected match outcomes, potential biases, or errors in player rankings.

	## Results
	Positive Anomalies: Matches where the predicted rank difference was significantly lower than expected.
	Negative Anomalies: Matches where the predicted rank difference was significantly higher than expected.
	The results are visualized using TSNE plots and saved as images and CSV files.

	## Contributions are welcome! Please feel free to submit a Pull Request or open an Issue for any improvements or bugs you encounter.

	## License
	This project is licensed under the MIT License. See the LICENSE file for more details.