File size: 4,717 Bytes
232e5e5
 
 
 
 
 
79e1434
74f6cdb
 
 
 
79e1434
232e5e5
 
 
 
74f6cdb
 
37c732e
 
74f6cdb
37c732e
 
74f6cdb
37c732e
74f6cdb
37c732e
74f6cdb
 
37c732e
74f6cdb
37c732e
 
 
74f6cdb
37c732e
74f6cdb
37c732e
 
74f6cdb
37c732e
 
 
74f6cdb
 
37c732e
74f6cdb
37c732e
 
74f6cdb
 
 
 
37c732e
74f6cdb
37c732e
 
74f6cdb
 
37c732e
74f6cdb
37c732e
 
74f6cdb
37c732e
74f6cdb
 
37c732e
74f6cdb
37c732e
 
74f6cdb
 
37c732e
05b53a3
74f6cdb
05b53a3
 
 
 
 
 
 
 
79d8ed7
37c732e
74f6cdb
79d8ed7
74f6cdb
37c732e
 
 
74f6cdb
37c732e
74f6cdb
37c732e
 
 
 
 
74f6cdb
37c732e
 
 
 
 
8a4511f
74f6cdb
4d161f5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8a4511f
74f6cdb
8a4511f
79e1434
9375d41
74f6cdb
9375d41
 
74f6cdb
9375d41
 
74f6cdb
9375d41
 
 
 
 
74f6cdb
9375d41
74f6cdb
9375d41
74f6cdb
9375d41
 
 
74f6cdb
9375d41
74f6cdb
9375d41
 
 
74f6cdb
9375d41
74f6cdb
9375d41
 
 
74f6cdb
9375d41
74f6cdb
9375d41
 
 
 
 
 
 
 
 
74f6cdb
9375d41
 
 
 
 
 
 
 
 
74f6cdb
9375d41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232e5e5
 
 
 
 
 
 
 
 
 
 
 
 
7f0977b
 
 
 
815b45f
74f6cdb
37c732e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# Credit Risk Modelling

# About

An interactive tool demonstrating credit risk modelling.

Emphasis on:

- Building models
- Comparing techniques
- Interpretating results

## Built With

- [Streamlit](https://streamlit.io/)

#### Hardware initially built on:

Processor: 11th Gen Intel(R) Core(TM) i7-1165G7 @2.80Ghz, 2803 Mhz, 4 Core(s), 8 Logical Processor(s)

Memory (RAM): 16GB

## Local setup

### Obtain the repo locally and open its root folder

#### To potentially contribute

```shell
git clone https://github.com/pkiage/tool-credit-risk-modelling.git
```

or

```shell
gh repo clone pkiage/tool-credit-risk-modelling
```

#### Just to deploy locally

Download ZIP

### (optional) Setup virtual environment:

```shell
python -m venv venv
```

### (optional) Activate virtual environment:

#### If using Unix based OS run the following in terminal:

```shell
.\venv\bin\activate
```

#### If using Windows run the following in terminal:

```shell
.\venv\Scripts\activate
```

### Install requirements by running the following in terminal:

#### Required packages

```shell
pip install -r requirements.txt
```

#### Complete graphviz installation

https://graphviz.org/download/

## Build and install local package

```shell
python setup.py build
```

```shell
python setup.py install
```

### Run the streamlit app (app.py) by running the following in terminal (from repository root folder):

```shell
streamlit run src/app.py
```

## Deployed setup details

For faster model building and testing (particularly XGBoost) a local setup or on a more powerful server than free heroku dyno type is recommended. ([tutorials on servers for data science & ML](https://course.fast.ai))

[Free Heroku dyno type](https://devcenter.heroku.com/articles/dyno-types) was used to deploy the app

Memory (RAM): 512 MB

CPU Share: 1x

Compute: 1x-4x

Dedicated: no

Sleeps: yes

[Enabled Autodeploy from Github](https://devcenter.heroku.com/articles/github-integration)

[Manual deploy to Heroku](https://devcenter.heroku.com/articles/git#deploy-your-code)

From main branch:
```shell
heroku login

git push heroku main
```

From branch beside main:

```shell
heroku login

git push heroku branch_name:main
```

# Roadmap

To view/submit ideas as well as contribute please view issues.

# Docs creation

## [pydeps](https://github.com/thebjorn/pydeps) Python module depenency visualization

_Delete **init**.py and **main**.py_ then run the following

### App and clusters

```shell
pydeps src/app.py --max-bacon=5 --cluster --rankdir BT -o docs/module-dependency-graph/src-app-clustered.svg
```

### App and links

Features, models, & visualization links:

```shell
pydeps src/app.py --only features models visualization --max-bacon=4 --rankdir BT -o docs/module-dependency-graph/src-feature-model-visualization.svg
```

### Only features

```shell
pydeps src/app.py  --only features --max-bacon=5 --cluster --max-cluster-size=3  --rankdir BT -o docs/module-dependency-graph/src-features.svg
```

### Only models

```shell
pydeps src/app.py  --only models --max-bacon=5 --cluster --max-cluster-size=15  --rankdir BT -o docs/module-dependency-graph/src-models.svg
```

## [code2flow](https://github.com/scottrogowski/code2flow) Call graphs for a pretty good estimate of project structure

### Logistic

```shell
code2flow src/models/logistic_train_model.py -o docs/call-graph/logistic_train_model.svg
```

```shell
code2flow src/models/logistic_model.py -o docs/call-graph/logistic_model.svg
```

### Xgboost

```shell
code2flow src/models/xgboost_train_model.py -o docs/call-graph/xgboost_train_model.svg
```

```shell
code2flow src/models/xgboost_model.py -o docs/call-graph/xgboost_model.svg
```

### utils

```shell
code2flow src/models/util_test.py -o docs/call-graph/util_test.svg
```

```shell
code2flow src/models/util_predict_model_threshold.py -o docs/call-graph/util_predict_model_threshold.svg
```

```shell
code2flow src/models/util_predict_model.py -o docs/call-graph/util_predict_model.svg
```

```shell
code2flow src/models/util_model_comparison.py -o docs/call-graph/util_model_comparison.svg
```

# References

## Inspiration:

[Credit Risk Modeling in Python by Datacamp](https://www.datacamp.com/courses/credit-risk-modeling-in-python)

- General Methodology
- Data

[A Gentle Introduction to Threshold-Moving for Imbalanced Classification](https://machinelearningmastery.com/threshold-moving-for-imbalanced-classification/)

- Selecting optimal threshold using Youden's J statistic

[Cookiecutter Data Science](https://drivendata.github.io/cookiecutter-data-science/)

- Project structure

[GraphViz Buildpack](https://github.com/weibeld/heroku-buildpack-graphviz)

- Buildpack used for Heroku deployment