backgrounddelay commited on
Commit
9e8ee07
·
verified ·
1 Parent(s): 5382486

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ ---
6
+ # PJM Energy Consumption Forecast
7
+
8
+ This repository contains machine learning models trained to forecast energy consumption for the PJM Interconnection, one of the largest grid operators in the United States. The models can predict energy consumption for both 24-hour and 7-day horizons.
9
+
10
+ ## Models Overview
11
+
12
+ We've trained and compared multiple models for energy consumption forecasting:
13
+
14
+ ### Available Models
15
+ - SARIMA (Statistical approach)
16
+ - Random Forest (Ensemble method)
17
+ - XGBoost (Gradient boosting)
18
+ - LSTM (Deep learning)
19
+
20
+ ### Model Characteristics
21
+ - **SARIMA**: Statistical approach, captures temporal dependencies
22
+ - **Random Forest**: Ensemble method, good at capturing non-linear relationships
23
+ - **XGBoost**: Gradient boosting, typically best for structured data
24
+ - **LSTM**: Deep learning approach, specialized for sequential data
25
+
26
+ ## Performance Metrics
27
+
28
+ The models were evaluated using multiple metrics:
29
+ - MAE (Mean Absolute Error)
30
+ - RMSE (Root Mean Square Error)
31
+ - MAPE (Mean Absolute Percentage Error)
32
+
33
+ ### Trade-offs
34
+ - **SARIMA**: Simple, interpretable, but less flexible
35
+ - **ML Models**: More complex, better performance, require more data
36
+ - **LSTM**: Best for capturing long-term dependencies, but most computationally intensive
37
+
38
+ ## Feature Sets
39
+
40
+ ### 24-hour Prediction Features
41
+ The models use various features including:
42
+ - Weather data (temperature, wind speed, precipitation)
43
+ - Temporal features (hour, day, month, weekday)
44
+ - Lag features (24h, 48h, 72h, 96h, 120h, 144h)
45
+ - Rolling statistics
46
+ - Holiday indicators
47
+ - Seasonal components
48
+
49
+ ### Data Preparation
50
+ - Training set size: 28,780 samples
51
+ - Test set size: 7,195 samples
52
+ - Total features: 89 (24h prediction)
53
+
54
+ ## Usage Requirements
55
+
56
+ ### Dependencies
57
+ - pandas
58
+ - numpy
59
+ - scikit-learn
60
+ - xgboost
61
+ - tensorflow
62
+ - statsmodels
63
+
64
+ ### Input Features Required
65
+ Key feature categories:
66
+ 1. **Weather Features**:
67
+ - Average wind speed
68
+ - Precipitation
69
+ - Temperature (avg, max, min)
70
+ - Weather data from multiple cities (Chicago, Washington, Pittsburgh, Columbus)
71
+
72
+ 2. **Temporal Features**:
73
+ - Year, hour, day, month
74
+ - Day of week
75
+ - Cyclical encodings (hour_sin, hour_cos, etc.)
76
+ - Time of day indicators (morning, afternoon, evening, night)
77
+
78
+ 3. **Historical Load**:
79
+ - Previous day consumption
80
+ - Weekly lags
81
+ - Rolling statistics
82
+
83
+ 4. **Calendar Features**:
84
+ - Holidays
85
+ - Seasonal indicators
86
+ - Weekend flags
87
+
88
+ ## Model Training
89
+
90
+ The models were trained using a time series split approach:
91
+ - 80% training data
92
+ - 20% test data
93
+ - Careful feature selection to avoid future data leakage
94
+ - Multiple evaluation metrics for comprehensive performance assessment
95
+
96
+ ## Limitations and Considerations
97
+
98
+ 1. **Data Requirements**: Models need extensive historical data and weather information
99
+ 2. **Computational Resources**: LSTM models require more computational power
100
+ 3. **Feature Availability**: Real-time predictions require access to current weather data
101
+ 4. **Update Frequency**: Models should be periodically retrained with new data