Paul Kiage
Develop: updated dependencies and docs (#7)
4d161f5 unverified
|
raw
history blame
4.72 kB

Credit Risk Modelling

About

An interactive tool demonstrating credit risk modelling.

Emphasis on:

  • Building models
  • Comparing techniques
  • Interpretating results

Built With

Hardware initially built on:

Processor: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80Ghz, 2803 Mhz, 4 Core(s), 8 Logical Processor(s)

Memory (RAM): 16GB

Local setup

Obtain the repo locally and open its root folder

To potentially contribute

git clone https://github.com/pkiage/tool-credit-risk-modelling.git

or

gh repo clone pkiage/tool-credit-risk-modelling

Just to deploy locally

Download ZIP

(optional) Setup virtual environment:

python -m venv venv

(optional) Activate virtual environment:

If using Unix based OS run the following in terminal:

.\venv\bin\activate

If using Windows run the following in terminal:

.\venv\Scripts\activate

Install requirements by running the following in terminal:

Required packages

pip install -r requirements.txt

Complete graphviz installation

https://graphviz.org/download/

Build and install local package

python setup.py build
python setup.py install

Run the streamlit app (app.py) by running the following in terminal (from repository root folder):

streamlit run src/app.py

Deployed setup details

For faster model building and testing (particularly XGBoost) a local setup or on a more powerful server than free heroku dyno type is recommended. (tutorials on servers for data science & ML)

Free Heroku dyno type was used to deploy the app

Memory (RAM): 512 MB

CPU Share: 1x

Compute: 1x-4x

Dedicated: no

Sleeps: yes

Enabled Autodeploy from Github

Manual deploy to Heroku

From main branch:

heroku login

git push heroku main

From branch beside main:

heroku login

git push heroku branch_name:main

Roadmap

To view/submit ideas as well as contribute please view issues.

Docs creation

pydeps Python module depenency visualization

Delete init.py and main.py then run the following

App and clusters

pydeps src/app.py --max-bacon=5 --cluster --rankdir BT -o docs/module-dependency-graph/src-app-clustered.svg

App and links

Features, models, & visualization links:

pydeps src/app.py --only features models visualization --max-bacon=4 --rankdir BT -o docs/module-dependency-graph/src-feature-model-visualization.svg

Only features

pydeps src/app.py  --only features --max-bacon=5 --cluster --max-cluster-size=3  --rankdir BT -o docs/module-dependency-graph/src-features.svg

Only models

pydeps src/app.py  --only models --max-bacon=5 --cluster --max-cluster-size=15  --rankdir BT -o docs/module-dependency-graph/src-models.svg

code2flow Call graphs for a pretty good estimate of project structure

Logistic

code2flow src/models/logistic_train_model.py -o docs/call-graph/logistic_train_model.svg
code2flow src/models/logistic_model.py -o docs/call-graph/logistic_model.svg

Xgboost

code2flow src/models/xgboost_train_model.py -o docs/call-graph/xgboost_train_model.svg
code2flow src/models/xgboost_model.py -o docs/call-graph/xgboost_model.svg

utils

code2flow src/models/util_test.py -o docs/call-graph/util_test.svg
code2flow src/models/util_predict_model_threshold.py -o docs/call-graph/util_predict_model_threshold.svg
code2flow src/models/util_predict_model.py -o docs/call-graph/util_predict_model.svg
code2flow src/models/util_model_comparison.py -o docs/call-graph/util_model_comparison.svg

References

Inspiration:

Credit Risk Modeling in Python by Datacamp

  • General Methodology
  • Data

A Gentle Introduction to Threshold-Moving for Imbalanced Classification

  • Selecting optimal threshold using Youden's J statistic

Cookiecutter Data Science

  • Project structure

GraphViz Buildpack

  • Buildpack used for Heroku deployment