Spaces:
Sleeping
Sleeping
greencatted
commited on
Commit
·
724a5e6
1
Parent(s):
ebebc7e
Add Figures and Analysis
Browse files- app.py +15 -8
- figures/FrequencyFinalCallTypeCounts.png +0 -0
app.py
CHANGED
@@ -10,39 +10,46 @@ if 'sample_data' not in st.session_state:
|
|
10 |
client = Socrata("data.cityofnewyork.us", None)
|
11 |
|
12 |
query = "INCIDENT_DATETIME >= '2024-03-01T00:00:00' AND INCIDENT_DATETIME < '2024-04-01T00:00:00'"
|
13 |
-
results = client.get("76xm-jjuj", where=query, limit=
|
14 |
|
15 |
data = pd.DataFrame.from_records(results)
|
16 |
data.columns = data.columns.str.upper()
|
17 |
-
data.dropna(inplace=True)
|
|
|
|
|
18 |
st.session_state.sample_data = data
|
19 |
|
20 |
sample_data = st.session_state.sample_data
|
21 |
|
22 |
st.title('EMS Call Classifier')
|
23 |
-
st.write("This project aims to improve the accuracy of predicting the nature of emergency calls in NYC, thereby improving emergency response times. It utilizes historical EMS dispatch data
|
24 |
|
25 |
st.header('The Data')
|
26 |
st.write('Provided through the NYC Open Data.')
|
27 |
-
|
|
|
28 |
t1.markdown(
|
29 |
"""The [EMS Incident Dispatch Data](https://data.cityofnewyork.us/Public-Safety/EMS-Incident-Dispatch-Data/76xm-jjuj/about_data) is generated by the EMS Computer Aided Dispatch System, and covers information about the incident as it relates to the assignment of resources and the Fire Department’s response to the emergency.
|
30 |
|
31 |
The 6GB of data spans from April 2008 to October 2024 and employs the use of over 140 distinct call types (e.g. “Eye Injury” or “Cardiac Arrest”).
|
32 |
""")
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
t1.subheader('Sample of First 100 Incidents from March 2024')
|
37 |
t1.dataframe(sample_data, use_container_width=True)
|
38 |
|
39 |
st.header("Analysis")
|
40 |
-
st.write("
|
|
|
|
|
|
|
|
|
41 |
|
42 |
st.header("Our Model")
|
43 |
st.markdown(
|
44 |
"""
|
45 |
Our model utilizes a Random Forest Multiclassifier enhanced with Gradient Boosting. We selected key features like time of day, day of week, borough, police precinct, and zip code, which proved most relevant in predicting the nature of an EMS dispatch incident. Initial call type and initial severity level were also included to provide a baseline for our predictions.
|
|
|
|
|
46 |
"""
|
47 |
)
|
48 |
|
|
|
10 |
client = Socrata("data.cityofnewyork.us", None)
|
11 |
|
12 |
query = "INCIDENT_DATETIME >= '2024-03-01T00:00:00' AND INCIDENT_DATETIME < '2024-04-01T00:00:00'"
|
13 |
+
results = client.get("76xm-jjuj", where=query, limit=200)
|
14 |
|
15 |
data = pd.DataFrame.from_records(results)
|
16 |
data.columns = data.columns.str.upper()
|
17 |
+
data.dropna(inplace=True, ignore_index=True)
|
18 |
+
|
19 |
+
data.drop(labels=['CAD_INCIDENT_ID'], axis=1, inplace=True)
|
20 |
st.session_state.sample_data = data
|
21 |
|
22 |
sample_data = st.session_state.sample_data
|
23 |
|
24 |
st.title('EMS Call Classifier')
|
25 |
+
st.write("This project aims to improve the accuracy of predicting the nature of emergency calls in NYC, thereby improving emergency response times. It utilizes historical EMS dispatch data to predict call types, ultimately allowing New Yorkers to get the help they need even faster.")
|
26 |
|
27 |
st.header('The Data')
|
28 |
st.write('Provided through the NYC Open Data.')
|
29 |
+
|
30 |
+
t1 = st.tabs(['EMS Incident Dispatch'])[0]
|
31 |
t1.markdown(
|
32 |
"""The [EMS Incident Dispatch Data](https://data.cityofnewyork.us/Public-Safety/EMS-Incident-Dispatch-Data/76xm-jjuj/about_data) is generated by the EMS Computer Aided Dispatch System, and covers information about the incident as it relates to the assignment of resources and the Fire Department’s response to the emergency.
|
33 |
|
34 |
The 6GB of data spans from April 2008 to October 2024 and employs the use of over 140 distinct call types (e.g. “Eye Injury” or “Cardiac Arrest”).
|
35 |
""")
|
36 |
|
37 |
+
t1.subheader(f'Sample of First {sample_data.shape[0]-1} Incidents from March 2024')
|
|
|
|
|
38 |
t1.dataframe(sample_data, use_container_width=True)
|
39 |
|
40 |
st.header("Analysis")
|
41 |
+
st.write("When analyzing our data, we realized that around half our types rarely were ever used for typing a case. Around 20 types alone were only represented once in the dataset.")
|
42 |
+
|
43 |
+
st.image("figures/FrequencyFinalCallTypeCounts.png", caption='Most types have very rare occurrences in our data.', use_column_width=True)
|
44 |
+
|
45 |
+
st.write("We determined that types with less than 0.05% frequency were insignificant and would only throw off our model. Thus, entries with these rare types were dropped.")
|
46 |
|
47 |
st.header("Our Model")
|
48 |
st.markdown(
|
49 |
"""
|
50 |
Our model utilizes a Random Forest Multiclassifier enhanced with Gradient Boosting. We selected key features like time of day, day of week, borough, police precinct, and zip code, which proved most relevant in predicting the nature of an EMS dispatch incident. Initial call type and initial severity level were also included to provide a baseline for our predictions.
|
51 |
+
|
52 |
+
In our training, we used a May 2024 to June 2024 data subset with over 270k incidents after dropping around 5k rows of incomplete entries and rare types.
|
53 |
"""
|
54 |
)
|
55 |
|
figures/FrequencyFinalCallTypeCounts.png
ADDED