pbc_complication_model / RandomForestClassifier_Pipeline_explanation.txt
michalisG
adding model
861a9a7
The Pipeline is using Simple-Imputer to impute the missing values of the data-setbefore pass them to the model.
The Pipeline is using One-Hot-Encoding to encode the categorical valuesof the data-set before pass them to model, most of the models need One-hot-encoding, this algorithm transforms the value from a category to numerical.
Many machine learning algorithms perform better or converge faster when features are on a relatively similar scale and/or close to normally distributed. This Pipeline uses Standard-Scaler algorithm which follows Standard Normal Distribution (SND). Therefore, it transforms each value in the column to range about the mean 0 and standard deviation 1, ie, each value will be normalised by subtracting the mean and dividing by standard deviation.
This Pipeline has a RandomForestClassifier model. This model has been used because the user selected the "Accuracy" option and the machine learning problem is classification.
The Grid Search hyper-parameter tuning was used in this Pipeline because the parameter list number was 9 or less, and an exhaustive Grid Search can be run.
Columns that have been removed from the training:
This is the target column: target