Spaces:
Runtime error
Runtime error
import streamlit as st | |
import pandas as pd | |
import seaborn as sns | |
import matplotlib.pyplot as plt | |
import plotly.express as px | |
from PIL import Image | |
st.set_page_config( | |
page_title='Diabetes Prediction', | |
layout='wide', | |
initial_sidebar_state='expanded' | |
) | |
def run(): | |
# title | |
st.title('Diabetes Exploration') | |
st.subheader('Explore The Diabetes Metrics & Dataset') | |
# add pic | |
image = Image.open('diabetes.png') | |
st.image(image) | |
st.markdown('---') | |
markdown_text = ''' | |
## Backgorund | |
Firstly, diabetes is a prevalent and chronic health condition that affects a significant portion of the population worldwide. | |
By providing a prediction model for diabetes, it can contribute to early detection and intervention, which is crucial in | |
managing the disease and preventing complications. Secondly, the integration of a diabetes prediction model in the web project | |
aims to enhance user experience and provide personalized health insights. Users can input their relevant health data, such as BMI, | |
blood glucose levels, and other factors, to obtain a prediction of their likelihood of having diabetes. | |
This information can empower individuals to make informed decisions about their health, seek appropriate medical attention | |
if necessary, and adopt preventive measures to reduce the risk of diabetes. Overall, the inclusion of a diabetes prediction | |
feature aligns with the objective of promoting health awareness and enabling users to take proactive steps towards their | |
well-being. | |
## Problem Statement | |
Using a dataset obtained from Kaggle, the goal is to build a predictive model that determines whether | |
individuals with specific characteristics are likely to have diabetes or not. | |
## Objective | |
The objectives of this project are to preprocess the dataset, explore its features, analyze the data, | |
implement four different algorithms for predicting the target variable, and perform Hyperparameter Tuning | |
to optimize the models' performance. | |
## About Dataset | |
| Variable | Description | | |
|-------------------------|-----------------------------------------------------------------------------------------------| | |
| Gender | Gender refers to the biological sex of the individual | | |
| Age | Age is an important factor as diabetes is more commonly diagnosed in older adults | | |
| hypertension | Hypertension is a medical condition in which the blood pressure in the arteries is | | |
| | persistently elevated (1 = True, 0 = False) | | |
| heart_disease | Heart disease is another medical condition that is associated with an increased risk of | | |
| | developing diabetes | | |
| smoking_history | Smoking history is also considered a risk factor for diabetes. | | |
| bmi | BMI (Body Mass Index) is a measure of body fat based on weight and height | | |
| HbA1c_level | HbA1c (Hemoglobin A1c) level is a measure of a person's average blood sugar level over the | | |
| | past 2-3 months | | |
| blood_glucose_level | Blood glucose level refers to the amount of glucose in the bloodstream at a given time | | |
| diabetes | Diabetes is the target variable being predicted (1 = True, 0 = False) | | |
''' | |
st.markdown(markdown_text) | |
st.markdown('---') | |
st.subheader('Data Exploratory') | |
st.markdown('---') | |
st.write('### Patient Information') | |
# show dataframe | |
data = pd.read_csv('diabetes_prediction_dataset.csv') | |
st.dataframe(data) | |
st.markdown('---') | |
# Distribusi Penderita Diabetes | |
fig, ax = plt.subplots() | |
plt.pie(data['diabetes'].value_counts(), | |
labels=['non-diabetic', 'diabetic'], | |
autopct='%1.1f%%', | |
colors=['Grey', 'red'], | |
startangle=25, | |
explode=[0.05, 0.05]) | |
plt.title('Diabetes Distribution') | |
plt.axis('equal') | |
st.pyplot(fig) | |
''' | |
Based on the chart above, around 91.5% of the total 100,000 patients do | |
not suffer from diabetes and only **8.5%** of patients **do have diabetes**. | |
91.5% of total non-diabetic patients will be analyzed with health factors | |
to predict whether the patient or others can get diabetes or not | |
''' | |
st.markdown('---') | |
# visual barplot | |
st.subheader('Chart Based on User Input ') | |
st.markdown('---') | |
choice = st.selectbox('Pick Numeric Columns: ', ('age', | |
'heart_disease', | |
'bmi', | |
'HbA1c_level', 'blood_glucose_level')) | |
fig,ax = plt.subplots(figsize=(15,10)) | |
sns.kdeplot(data[choice], fill=True) | |
ax.set_title(choice.capitalize()+' Ratio') | |
st.pyplot(fig) | |
st.markdown('---') | |
# visual 2 | |
## Categorical Data Plot | |
pilihan_kategori = st.selectbox('Pick Category Column : ', ('gender','hypertension','smoking_history','diabetes')) | |
fig= plt.figure(figsize=(8, 6)) | |
sns.countplot(data=data, x=pilihan_kategori, hue='diabetes', palette='Set2') | |
plt.xlabel(pilihan_kategori.capitalize()) | |
plt.ylabel('Count') | |
plt.title(pilihan_kategori.capitalize()+' Ratio') | |
plt.legend(title='Diabetes') | |
st.pyplot(fig) | |
if __name__ == '__main__': | |
run() |