Spaces:
Sleeping
Sleeping
File size: 4,232 Bytes
37b5724 4bb40bd ed508e1 37b5724 4bb40bd 37b5724 4bfc986 53393c7 ddd5765 7800474 cd278ae 6b7e6bd c84c181 40971d2 c84c181 d81a187 f1a8520 cba8d18 5afa440 f1a8520 23831aa 74184b2 d01e3c7 74184b2 54402a8 5afa440 ca097b8 7800474 ba10016 d5ba983 f1a8520 f592bc7 5cbdb2a 5c4b7f5 f1a8520 34b64a8 cba8d18 f1a8520 eaa59b4 f1a8520 5457fd4 f1a8520 eaa59b4 54402a8 f1a8520 0e00ad4 54402a8 59bfa2f 0e00ad4 aef68bd c0dca0c 5c4b7f5 5b2fe2a 7800474 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
title: HousePricePredictionApp
emoji: 🏠
colorFrom: pink
colorTo: yellow
sdk: streamlit
sdk_version: 1.21.0
app_file: app.py
pinned: false
---
# CS634Project - Real Estate House Price Prediction
Milestone-3 notebook: https://github.com/aye-thuzar/CS634Project/blob/main/CS634Project_Milestone3_Final_AyeThuzar.ipynb
Hugging Face App: https://huggingface.co/spaces/ayethuzar/HousePricePredictionApp
Landing Page for the App: https://sites.google.com/view/cs634-realestatehousepricepred/home
App Demonstration Video: https://www.youtube.com/watch?v=jYB1xpeikYQ&t=13s
***********
Results
***********
XGBoost Model's RMSE: 28986 (Milestone-2)
Optuna optimized XGBoost's RMSE: 28047
Baseline LGBM's RMSE: 34110
Optuna optimized LGBM's RMSE: 28329
***********
## Summary
***********
Dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
**milestone-1:** https://github.com/aye-thuzar/CS634Project/blob/main/milestone-1/README_milestone1.md
**Data Processing and Feature Selection:**
For the feature selection, I started by dropping columns with a low correlation (< 0.4) with SalePrice. I then dropped columns with low variances (< 1). After that, I checked the correlation matrix between columns to drop selected columns that have a correlation greater than 0.5 but with consideration for domain knowledge. After that, I checked for NAs in the numerical columns. Then, based on the result, I used domain knowledge to fill the NAs with appropriate values. In this case, I used 0 to fill the NAs as it was the most relevant value. As for the categorical NAs, they were replaced with ‘None’. Once, all the NAs were taken care of, I used LabelEncoder to encode the categorical values. I, then, checked for a correlation between columns and dropped them based on domain knowledge.
Here are the 10 features I selected:
'OverallQual': Overall material and finish quality
'YearBuilt': Original construction date
'TotalBsmtSF': Total square feet of basement area
'GrLivArea': Above grade (ground) living area square feet
'MasVnrArea': Masonry veneer area in square feet
'BsmtFinType1': Quality of basement finished area
'Neighborhood': Physical locations within Ames city limits
'GarageType': Garage location
'SaleCondition': Condition of sale
'BsmtExposure': Walkout or garden-level basement walls
All the attributes are encoded and normalized before splitting into train and test with 80% train and 20% test.
**Milestone 2:**
For milestone 2, I used an XGBoost Model with objective="reg:squarederror" and max_depth=3. The RMSE score is 28986.
**Milestone 3:**
For milestone 3, I used a light gradient boosting machine (LGBM) with default parameters for baseline and hyperparameter-tuned with Optuna for the optimized model. The results are stated at the beginning of my readme file. I also hyperparameter-tuned my milestone-2 XGBoost model.
I tested the pickled models in this notebook: https://github.com/aye-thuzar/CS634Project/blob/main/CS634Project_Milestone3_AyeThuzar_Testing.ipynb
For the sliders of the categorical features in the app, the numbers and the corresponding meanings are described here: https://github.com/aye-thuzar/CS634Project/edit/main/docs.md
**Milestone 4:**
Please see Milestone4Documentation.md: https://github.com/aye-thuzar/CS634Project/blob/main/Milestone4Documentation.md
Here is the landing page for my app: https://sites.google.com/view/cs634-realestatehousepricepred/home
**References:**
https://towardsdatascience.com/analysing-interactions-with-shap-8c4a2bc11c2a
https://towardsdatascience.com/introduction-to-shap-with-python-d27edc23c454
https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/
https://www.kaggle.com/code/rnepal2/lightgbm-optuna-housing-prices-regression/notebook
https://www.kaggle.com/code/rnepal2/lightgbm-optuna-housing-prices-regression/notebook
https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/
https://towardsdatascience.com/why-is-everyone-at-kaggle-obsessed-with-optuna-for-hyperparameter-tuning-7608fdca337c
https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main
|