Spaces:

ayethuzar
/

HousePricePredictionApp

Sleeping

App Files Files Community

ayethuzar commited on Jul 22, 2023

Commit

f1a8520

unverified ·

1 Parent(s): d07a3f6

Update README.md

Browse files

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

@@ -4,8 +4,12 @@ Milestone-3 notebook: https://colab.research.google.com/drive/17-7A0RkGcwqcJw0Ic
 Hugging Face App:
 Results:
 XGBoost Model's RMSE: 28986  (Milestone-2)
 Baseline LGBM's RMSE: 26233
@@ -44,6 +48,37 @@ min_child_samples :	 1
 ***********
 Reference:
 https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main

 Hugging Face App:
+***********
 Results:
+***********
 XGBoost Model's RMSE: 28986  (Milestone-2)
 Baseline LGBM's RMSE: 26233
 ***********
+Documentation
+***********
+Dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
+Data Processing and Feature Selection:
+For the feature selection, I started by dropping columns with a low correlation (< 0.4) with SalePrice. I then dropped columns with low variances (< 1). After that, I checked the correlation matrix between columns to drop selected columns that have a correlation greater than 0.5 but with consideration for domain knowledge. After that, I checked for NAs in the numerical columns. Then, based on the result, I used domain knowledge to fill the NAs with appropriate values. In this case, I used 0 to fill the NAs as it was the most relevant value. As for the categorical NAs, they were replaced with ‘None’. Once, all the NAs were taken care of, I used LabelEncoder to encode the categorical values. I, then, checked for a correlation between columns and dropped them based on domain knowledge.
+Here are the 10 features I selected:
+ 'OverallQual',
+ 'YearBuilt',
+ 'TotalBsmtSF',
+ 'GrLivArea',
+ 'MasVnrArea',
+ 'BsmtFinType1',
+ 'Neighborhood',
+ 'GarageType',
+ 'SaleCondition',
+ 'BsmtExposure'
+All the attributes are encoded and normalized before splitting into train and test with 80% train and 20% test.
+**Milestone 2:
+For milestone 2, I ran an XGBoost Model with objective="reg:squarederror" and max_depth=3. The RMSE score is 28986.
+**Milestone 3:
 Reference:
 https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main