PJM Energy Consumption Forecast

This repository contains machine learning models trained to forecast energy consumption for the PJM Interconnection, one of the largest grid operators in the United States. The models can predict energy consumption for both 24-hour and 7-day horizons.

Models Overview

We've trained and compared multiple models for energy consumption forecasting:

Available Models

  • SARIMA (Statistical approach)
  • Random Forest (Ensemble method)
  • XGBoost (Gradient boosting)
  • LSTM (Deep learning)

Model Characteristics

  • SARIMA: Statistical approach, captures temporal dependencies
  • Random Forest: Ensemble method, good at capturing non-linear relationships
  • XGBoost: Gradient boosting, typically best for structured data
  • LSTM: Deep learning approach, specialized for sequential data

Performance Metrics

The models were evaluated using multiple metrics:

  • MAE (Mean Absolute Error)
  • RMSE (Root Mean Square Error)
  • MAPE (Mean Absolute Percentage Error)

Trade-offs

  • SARIMA: Simple, interpretable, but less flexible
  • ML Models: More complex, better performance, require more data
  • LSTM: Best for capturing long-term dependencies, but most computationally intensive

Feature Sets

24-hour Prediction Features

The models use various features including:

  • Weather data (temperature, wind speed, precipitation)
  • Temporal features (hour, day, month, weekday)
  • Lag features (24h, 48h, 72h, 96h, 120h, 144h)
  • Rolling statistics
  • Holiday indicators
  • Seasonal components

Data Preparation

  • Training set size: 28,780 samples
  • Test set size: 7,195 samples
  • Total features: 89 (24h prediction)

Usage Requirements

Dependencies

  • pandas
  • numpy
  • scikit-learn
  • xgboost
  • tensorflow
  • statsmodels

Input Features Required

Key feature categories:

  1. Weather Features:

    • Average wind speed
    • Precipitation
    • Temperature (avg, max, min)
    • Weather data from multiple cities (Chicago, Washington, Pittsburgh, Columbus)
  2. Temporal Features:

    • Year, hour, day, month
    • Day of week
    • Cyclical encodings (hour_sin, hour_cos, etc.)
    • Time of day indicators (morning, afternoon, evening, night)
  3. Historical Load:

    • Previous day consumption
    • Weekly lags
    • Rolling statistics
  4. Calendar Features:

    • Holidays
    • Seasonal indicators
    • Weekend flags

Model Training

The models were trained using a time series split approach:

  • 80% training data
  • 20% test data
  • Careful feature selection to avoid future data leakage
  • Multiple evaluation metrics for comprehensive performance assessment

Limitations and Considerations

  1. Data Requirements: Models need extensive historical data and weather information
  2. Computational Resources: LSTM models require more computational power
  3. Feature Availability: Real-time predictions require access to current weather data
  4. Update Frequency: Models should be periodically retrained with new data
Downloads last month
18
Inference API
Unable to determine this model’s pipeline type. Check the docs .