ayethuzar commited on
Commit
5cbdb2a
·
unverified ·
1 Parent(s): d5ba983

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -41,6 +41,8 @@ Optuna optimized LGBM's RMSE: 28329
41
 
42
  Dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
43
 
 
 
44
  **Data Processing and Feature Selection:**
45
 
46
  For the feature selection, I started by dropping columns with a low correlation (< 0.4) with SalePrice. I then dropped columns with low variances (< 1). After that, I checked the correlation matrix between columns to drop selected columns that have a correlation greater than 0.5 but with consideration for domain knowledge. After that, I checked for NAs in the numerical columns. Then, based on the result, I used domain knowledge to fill the NAs with appropriate values. In this case, I used 0 to fill the NAs as it was the most relevant value. As for the categorical NAs, they were replaced with ‘None’. Once, all the NAs were taken care of, I used LabelEncoder to encode the categorical values. I, then, checked for a correlation between columns and dropped them based on domain knowledge.
 
41
 
42
  Dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
43
 
44
+ **milestone-1:**
45
+
46
  **Data Processing and Feature Selection:**
47
 
48
  For the feature selection, I started by dropping columns with a low correlation (< 0.4) with SalePrice. I then dropped columns with low variances (< 1). After that, I checked the correlation matrix between columns to drop selected columns that have a correlation greater than 0.5 but with consideration for domain knowledge. After that, I checked for NAs in the numerical columns. Then, based on the result, I used domain knowledge to fill the NAs with appropriate values. In this case, I used 0 to fill the NAs as it was the most relevant value. As for the categorical NAs, they were replaced with ‘None’. Once, all the NAs were taken care of, I used LabelEncoder to encode the categorical values. I, then, checked for a correlation between columns and dropped them based on domain knowledge.