Spaces:
Sleeping
Sleeping
File size: 3,869 Bytes
37b5724 4bb40bd ed508e1 37b5724 4bb40bd 37b5724 87609f5 53393c7 03f261a 7800474 cd278ae 6b7e6bd d81a187 f1a8520 cba8d18 5afa440 f1a8520 23831aa 74184b2 5afa440 74184b2 5afa440 74184b2 4ba239c 5afa440 4ba239c 5afa440 4ba239c 5afa440 4ba239c 5afa440 eaa59b4 5afa440 7800474 ba10016 8d9e190 f1a8520 5c4b7f5 f1a8520 34b64a8 cba8d18 f1a8520 eaa59b4 f1a8520 5457fd4 f1a8520 eaa59b4 5457fd4 f1a8520 5c4b7f5 5b2fe2a 7800474 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
title: HousePricePredictionApp
emoji: 🏠
colorFrom: pink
colorTo: yellow
sdk: streamlit
sdk_version: 1.21.0
app_file: app.py
pinned: false
---
# CS634Project
Milestone-3 notebook: [[https://colab.research.google.com/drive/17-7A0RkGcwqcJw0IcSvkniDmhbn5SuXe]](https://github.com/aye-thuzar/CS634Project/blob/milestone-3/CS634Project_Milestone3_AyeThuzar.ipynb)(https://colab.research.google.com/drive/1BeoZ4Dxhgd6OcUwPhk6rKCeFnDFMUCmt#scrollTo=TZ4Ci-YXOSl6)
Hugging Face App: https://huggingface.co/spaces/ayethuzar/HousePricePredictionApp
App Demonstration Video:
***********
Results
***********
XGBoost Model's RMSE: 28986 (Milestone-2)
Baseline LGBM's RMSE: 26233
Optuna optimized LGBM's RMSE: 13799.282803291926
***********
Hyperparameter Tuning with Optuna
************
Total number of trials: 120
Best RMSE score on validation data: 12338.665498601415
**Best params:**
boosting_type : goss
reg_alpha : 3.9731274536451826
reg_lambda : 0.8825276525195174
colsample_bytree : 1.0
subsample : 1.0
learning_rate : 0.05
max_depth : 6
num_leaves : 48
min_child_samples : 1
***********
## Documentation for Milestone 4
***********
Dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
**Data Processing and Feature Selection:**
For the feature selection, I started by dropping columns with a low correlation (< 0.4) with SalePrice. I then dropped columns with low variances (< 1). After that, I checked the correlation matrix between columns to drop selected columns that have a correlation greater than 0.5 but with consideration for domain knowledge. After that, I checked for NAs in the numerical columns. Then, based on the result, I used domain knowledge to fill the NAs with appropriate values. In this case, I used 0 to fill the NAs as it was the most relevant value. As for the categorical NAs, they were replaced with ‘None’. Once, all the NAs were taken care of, I used LabelEncoder to encode the categorical values. I, then, checked for a correlation between columns and dropped them based on domain knowledge.
Here are the 10 features I selected:
'OverallQual': Overall material and finish quality
'YearBuilt': Original construction date
'TotalBsmtSF': Total square feet of basement area
'GrLivArea': Above grade (ground) living area square feet
'MasVnrArea': Masonry veneer area in square feet
'BsmtFinType1': Quality of basement finished area
'Neighborhood': Physical locations within Ames city limits
'GarageType': Garage location
'SaleCondition': Condition of sale
'BsmtExposure': Walkout or garden-level basement walls
All the attributes are encoded and normalized before splitting into train and test with 80% train and 20% test.
**Milestone 2:**
For milestone 2, I used an XGBoost Model with objective="reg:squarederror" and max_depth=3. The RMSE score is 28986.
**Milestone 3:**
For milestone 3, I used light gradient boosting machine (LGBM) with default parameters for baseline and hyperparameter-tuned with Optuna for the optimized model. The results are stated at the beginning of my readme file.
**References:**
https://towardsdatascience.com/analysing-interactions-with-shap-8c4a2bc11c2a
https://towardsdatascience.com/introduction-to-shap-with-python-d27edc23c454
https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/
https://www.kaggle.com/code/rnepal2/lightgbm-optuna-housing-prices-regression/notebook
https://www.kaggle.com/code/rnepal2/lightgbm-optuna-housing-prices-regression/notebook
https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/
https://towardsdatascience.com/why-is-everyone-at-kaggle-obsessed-with-optuna-for-hyperparameter-tuning-7608fdca337c
https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main
|