--- license: mit language: - en --- # PJM Energy Consumption Forecast This repository contains machine learning models trained to forecast energy consumption for the PJM Interconnection, one of the largest grid operators in the United States. The models can predict energy consumption for both 24-hour and 7-day horizons. ## Models Overview We've trained and compared multiple models for energy consumption forecasting: ### Available Models - SARIMA (Statistical approach) - Random Forest (Ensemble method) - XGBoost (Gradient boosting) - LSTM (Deep learning) ### Model Characteristics - **SARIMA**: Statistical approach, captures temporal dependencies - **Random Forest**: Ensemble method, good at capturing non-linear relationships - **XGBoost**: Gradient boosting, typically best for structured data - **LSTM**: Deep learning approach, specialized for sequential data ## Performance Metrics The models were evaluated using multiple metrics: - MAE (Mean Absolute Error) - RMSE (Root Mean Square Error) - MAPE (Mean Absolute Percentage Error) ### Trade-offs - **SARIMA**: Simple, interpretable, but less flexible - **ML Models**: More complex, better performance, require more data - **LSTM**: Best for capturing long-term dependencies, but most computationally intensive ## Feature Sets ### 24-hour Prediction Features The models use various features including: - Weather data (temperature, wind speed, precipitation) - Temporal features (hour, day, month, weekday) - Lag features (24h, 48h, 72h, 96h, 120h, 144h) - Rolling statistics - Holiday indicators - Seasonal components ### Data Preparation - Training set size: 28,780 samples - Test set size: 7,195 samples - Total features: 89 (24h prediction) ## Usage Requirements ### Dependencies - pandas - numpy - scikit-learn - xgboost - tensorflow - statsmodels ### Input Features Required Key feature categories: 1. **Weather Features**: - Average wind speed - Precipitation - Temperature (avg, max, min) - Weather data from multiple cities (Chicago, Washington, Pittsburgh, Columbus) 2. **Temporal Features**: - Year, hour, day, month - Day of week - Cyclical encodings (hour_sin, hour_cos, etc.) - Time of day indicators (morning, afternoon, evening, night) 3. **Historical Load**: - Previous day consumption - Weekly lags - Rolling statistics 4. **Calendar Features**: - Holidays - Seasonal indicators - Weekend flags ## Model Training The models were trained using a time series split approach: - 80% training data - 20% test data - Careful feature selection to avoid future data leakage - Multiple evaluation metrics for comprehensive performance assessment ## Limitations and Considerations 1. **Data Requirements**: Models need extensive historical data and weather information 2. **Computational Resources**: LSTM models require more computational power 3. **Feature Availability**: Real-time predictions require access to current weather data 4. **Update Frequency**: Models should be periodically retrained with new data