Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -4,8 +4,12 @@ Milestone-3 notebook: https://colab.research.google.com/drive/17-7A0RkGcwqcJw0Ic
|
|
4 |
|
5 |
Hugging Face App:
|
6 |
|
|
|
|
|
7 |
Results:
|
8 |
|
|
|
|
|
9 |
XGBoost Model's RMSE: 28986 (Milestone-2)
|
10 |
|
11 |
Baseline LGBM's RMSE: 26233
|
@@ -44,6 +48,37 @@ min_child_samples : 1
|
|
44 |
|
45 |
***********
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
Reference:
|
48 |
|
49 |
https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main
|
|
|
4 |
|
5 |
Hugging Face App:
|
6 |
|
7 |
+
***********
|
8 |
+
|
9 |
Results:
|
10 |
|
11 |
+
***********
|
12 |
+
|
13 |
XGBoost Model's RMSE: 28986 (Milestone-2)
|
14 |
|
15 |
Baseline LGBM's RMSE: 26233
|
|
|
48 |
|
49 |
***********
|
50 |
|
51 |
+
Documentation
|
52 |
+
|
53 |
+
***********
|
54 |
+
|
55 |
+
Dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
|
56 |
+
|
57 |
+
Data Processing and Feature Selection:
|
58 |
+
|
59 |
+
For the feature selection, I started by dropping columns with a low correlation (< 0.4) with SalePrice. I then dropped columns with low variances (< 1). After that, I checked the correlation matrix between columns to drop selected columns that have a correlation greater than 0.5 but with consideration for domain knowledge. After that, I checked for NAs in the numerical columns. Then, based on the result, I used domain knowledge to fill the NAs with appropriate values. In this case, I used 0 to fill the NAs as it was the most relevant value. As for the categorical NAs, they were replaced with ‘None’. Once, all the NAs were taken care of, I used LabelEncoder to encode the categorical values. I, then, checked for a correlation between columns and dropped them based on domain knowledge.
|
60 |
+
|
61 |
+
Here are the 10 features I selected:
|
62 |
+
|
63 |
+
'OverallQual',
|
64 |
+
'YearBuilt',
|
65 |
+
'TotalBsmtSF',
|
66 |
+
'GrLivArea',
|
67 |
+
'MasVnrArea',
|
68 |
+
'BsmtFinType1',
|
69 |
+
'Neighborhood',
|
70 |
+
'GarageType',
|
71 |
+
'SaleCondition',
|
72 |
+
'BsmtExposure'
|
73 |
+
|
74 |
+
All the attributes are encoded and normalized before splitting into train and test with 80% train and 20% test.
|
75 |
+
|
76 |
+
**Milestone 2:
|
77 |
+
|
78 |
+
For milestone 2, I ran an XGBoost Model with objective="reg:squarederror" and max_depth=3. The RMSE score is 28986.
|
79 |
+
|
80 |
+
**Milestone 3:
|
81 |
+
|
82 |
Reference:
|
83 |
|
84 |
https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main
|