Spaces:

beemabee
/

Diabetic_Predictor

Runtime error

App Files Files Community

Andika Atmanegara Putra commited on May 25, 2023

Commit

6bbca31

•

1 Parent(s): 4dda477

add all files

Browse files

Files changed (13) hide show

New Text Document.txt +0 -0
ab_model.pkl +3 -0
app.py +10 -0
diabetes.png +0 -0
diabetes2.png +0 -0
diabetes_prediction_dataset.csv +0 -0
eda.py +125 -0
num_cols_nsc.txt +1 -0
num_cols_sc.txt +1 -0
prediction.py +101 -0
requirements.txt +9 -0
scale_feat.pkl +3 -0
winsoriser.pkl +3 -0

New Text Document.txt ADDED Viewed

File without changes

ab_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:454753be2ee87a8b266b8a06de5f11e9b478ee727b22fb6803d3adc0ee988eb0
+size 27691

app.py ADDED Viewed

	@@ -0,0 +1,10 @@

+import streamlit as st
+import eda
+import prediction
+navigation = st.sidebar.selectbox('pilih halaman: ', ('Explore', 'Prediction'))
+if navigation == 'Explore':
+    eda.run()
+else:
+    prediction.run()

diabetes.png ADDED Viewed

diabetes2.png ADDED Viewed

diabetes_prediction_dataset.csv ADDED Viewed

The diff for this file is too large to render. See raw diff

eda.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import streamlit as st
+import pandas as pd
+import seaborn as sns
+import matplotlib.pyplot as plt
+import plotly.express as px
+from PIL import Image
+st.set_page_config(
+    page_title='Diabetes Prediction',
+    layout='wide',
+    initial_sidebar_state='expanded'
+)
+def run():
+# title
+    st.title('Diabetes Exploration')
+    st.subheader('Explore The Diabetes Metrics & Dataset')
+    # add pic
+    image = Image.open('diabetes.png')
+    st.image(image)
+    st.markdown('---')
+    markdown_text = '''
+    ## Backgorund
+    Firstly, diabetes is a prevalent and chronic health condition that affects a significant portion of the population worldwide.
+    By providing a prediction model for diabetes, it can contribute to early detection and intervention, which is crucial in
+    managing the disease and preventing complications. Secondly, the integration of a diabetes prediction model in the web project
+    aims to enhance user experience and provide personalized health insights. Users can input their relevant health data, such as BMI,
+    blood glucose levels, and other factors, to obtain a prediction of their likelihood of having diabetes.
+    This information can empower individuals to make informed decisions about their health, seek appropriate medical attention
+    if necessary, and adopt preventive measures to reduce the risk of diabetes. Overall, the inclusion of a diabetes prediction
+    feature aligns with the objective of promoting health awareness and enabling users to take proactive steps towards their
+    well-being.
+    ## Problem Statement
+    Using a dataset obtained from Kaggle, the goal is to build a predictive model that determines whether
+    individuals with specific characteristics are likely to have diabetes or not.
+    ## Objective
+    The objectives of this project are to preprocess the dataset, explore its features, analyze the data,
+    implement four different algorithms for predicting the target variable, and perform Hyperparameter Tuning
+    to optimize the models' performance.
+    ## About Dataset
+    |         Variable        |                                         Description                                           |
+    |-------------------------|-----------------------------------------------------------------------------------------------|
+    | Gender                  | Gender refers to the biological sex of the individual                                         |
+    | Age                     | Age is an important factor as diabetes is more commonly diagnosed in older adults             |
+    | hypertension            | Hypertension is a medical condition in which the blood pressure in the arteries is            |
+    |                         | persistently elevated (1 = True, 0 = False)                                                   |
+    | heart_disease           | Heart disease is another medical condition that is associated with an increased risk of       |
+    |                         | developing diabetes                                                                           |
+    | smoking_history         | Smoking history is also considered a risk factor for diabetes.                                |
+    | bmi                     | BMI (Body Mass Index) is a measure of body fat based on weight and height                     |
+    | HbA1c_level             | HbA1c (Hemoglobin A1c) level is a measure of a person's average blood sugar level over the    |
+    |                         | past 2-3 months                                                                               |
+    | blood_glucose_level     | Blood glucose level refers to the amount of glucose in the bloodstream at a given time        |
+    | diabetes                | Diabetes is the target variable being predicted (1 = True, 0 = False)                         |
+    '''
+    st.markdown(markdown_text)
+    st.markdown('---')
+    st.subheader('Data Exploratory')
+    st.markdown('---')
+    st.write('### Patient Information')
+    # show dataframe
+    data = pd.read_csv('diabetes_prediction_dataset.csv')
+    st.dataframe(data)
+    st.markdown('---')
+    # Distribusi Penderita Diabetes
+    fig, ax = plt.subplots()
+    plt.pie(data['diabetes'].value_counts(),
+            labels=['non-diabetic', 'diabetic'],
+            autopct='%1.1f%%',
+            colors=['Grey', 'red'],
+            startangle=25,
+            explode=[0.05, 0.05])
+    plt.title('Diabetes Distribution')
+    plt.axis('equal')
+    st.pyplot(fig)
+    '''
+    Based on the chart above, around 91.5% of the total 100,000 patients do
+    not suffer from diabetes and only **8.5%** of patients **do have diabetes**.
+    91.5% of total non-diabetic patients will be analyzed with health factors
+    to predict whether the patient or others can get diabetes or not
+    '''
+    st.markdown('---')
+    # visual barplot
+    st.subheader('Chart Based on User Input ')
+    st.markdown('---')
+    choice = st.selectbox('Pick Numeric Columns: ', ('age',
+                                            'heart_disease',
+                                            'bmi',
+                                            'HbA1c_level', 'blood_glucose_level'))
+    fig,ax = plt.subplots(figsize=(15,10))
+    sns.kdeplot(data[choice], fill=True)
+    ax.set_title(choice.capitalize()+' Ratio')
+    st.pyplot(fig)
+    st.markdown('---')
+    # visual 2
+     ## Categorical Data Plot
+    pilihan_kategori = st.selectbox('Pick Category Column : ', ('gender','hypertension','smoking_history','diabetes'))
+    fig= plt.figure(figsize=(8, 6))
+    sns.countplot(data=data, x=pilihan_kategori, hue='diabetes', palette='Set2')
+    plt.xlabel(pilihan_kategori.capitalize())
+    plt.ylabel('Count')
+    plt.title(pilihan_kategori.capitalize()+' Ratio')
+    plt.legend(title='Diabetes')
+    st.pyplot(fig)
+if __name__ == '__main__':
+    run()

num_cols_nsc.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ["gender", "hypertension", "heart_disease"]

num_cols_sc.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ ["age", "bmi", "hemoglobin_level", "blood_glucose_level"]

prediction.py ADDED Viewed

	@@ -0,0 +1,101 @@

+import streamlit as st
+import pandas as pd
+import numpy as np
+import pickle
+import json
+from PIL import Image
+# load all files
+with open('ab_model.pkl', 'rb') as file_1:
+    ab_model = pickle.load(file_1)
+# Pre-processing
+with open('scale_feat.pkl', 'rb') as file_2:
+    scale_feat = pickle.load(file_2)
+with open('winsoriser.pkl', 'rb') as file_3:
+    winsoriser = pickle.load(file_3)
+# List Numeric & Category
+with open('num_cols_sc.txt', 'r') as file_4:
+    num_cols_sc = json.load(file_4)
+with open('num_cols_nsc.txt', 'r') as file_5:
+    num_cols_nsc = json.load(file_5)
+def run():
+    with st.form(key='from_diabetes'):
+        st.title('Prediction Page')
+        # sub header
+        st.subheader('We calculate your metrics to calculate diabetes')
+        # add pic
+        image = Image.open('diabetes2.png')
+        st.image(image)
+        st.write('Columns below are parameter we would like to use to predict if a patient have a diabetes or not.')
+        st.write('*`Please fill columns below to predict`*')
+        gender = st.selectbox('Gender', [0,1], help='0 = Female, 1 = Male')
+        age = st.number_input('Age', min_value=25, max_value=80,
+                            value=45, step=1, help='Usia Pasien')
+        hypertension = st.number_input('Hypertension', min_value=0, max_value=1 , value=0,
+                                       step=1, help='have hypertension?')
+        heart_disease = st.number_input('Heart Disease', min_value=0, max_value=1 , value=0,
+                                       step=1, help='have heart disease?')
+        bmi = st.number_input('Body Mass Index', min_value=5, max_value=80,
+                                value=30, step=5, help='Amount of BMI')
+        HbA1c_level = st.number_input('Hemogloblin Level', min_value= 3, max_value= 10,
+                                      value= 6, help='Level of Hemogloblin 3-10')
+        blood_glucose_level = st.slider('Glucose Level', 0, 400, 150, step=10,
+                                        help='Glucose amount in blood stream')
+        st.markdown('---')
+        submitted = st.form_submit_button('Predict')
+        data_inf = {
+                    'age': age,
+                    'bmi': bmi,
+                    'hemoglobin_level': HbA1c_level,
+                    'blood_glucose_level': blood_glucose_level,
+                    'gender': gender,
+                    'hypertension': hypertension,
+                    'heart_disease': heart_disease,
+                }
+        data_inf = pd.DataFrame([data_inf])
+        st.dataframe(data_inf)
+        if submitted:
+            data_inf_sc = data_inf[num_cols_sc]
+            data_inf_nsc = data_inf[num_cols_nsc]
+            # scalling
+            data_inf_sc = scale_feat.transform(data_inf_sc)
+            data_inf_sc = pd.DataFrame(data_inf_sc, columns=num_cols_sc)
+            # Reset Index
+            data_inf_sc.reset_index(drop= True, inplace= True)
+            data_inf_nsc.reset_index(drop = True, inplace = True)
+            data_final = pd.concat([data_inf_sc, data_inf_nsc], axis= 1)
+            # modeling
+            y_pred_inf = ab_model.predict(data_final)
+            if y_pred_inf[0] == 1:
+                st.write('**`Prediction: You Have Diabetes`**')
+            else:
+                st.write('# **`Prediction: You do not Have Diabetes`**')
+    if __name__ == '__main__':
+        run()

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+streamlit
+pandas
+seaborn
+matplotlib
+plotly
+Pillow
+catboost
+feature-engine
+scikit-learn==1.2.2

scale_feat.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24ecface8c267c8b39e51c68eee345f48bb24f6a2febfc4cafc5d8b2824fc2fa
+size 783

winsoriser.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c663972e73c566cd5440a11e16d3c077ce5a0bea17e80054f90dc9e9fa71891e
+size 452