dnirfana commited on
Commit
c0decd1
1 Parent(s): 97de353

Update all EDA, adding Loading Feature for each Plot

Browse files
Files changed (1) hide show
  1. eda.py +38 -4
eda.py CHANGED
@@ -295,14 +295,48 @@ def app():
295
  plt.title('Amount Balance vs New Balance Origin')
296
  st.pyplot(fig)
297
  st.write('The scatter plot shows the relationship between New Balance Origin and Amount Balance. Similar to the previous plot, the data points highlight how most transactions cluster around lower values for both balances.')
298
- st.markdown('- **High Density at Lower Values** The majority of the data points are concentrated on the origin (0,0), indicating that most transactions involve smaller amounts for both New Balance Origin and Amount Balance.')
299
- st.markdown('- **Vertical Distribution** There are points with higher New Balance Origin spread vertically, mostly associated with lower Amount Balance values.')
300
- st.markdown('- **Horizontal Distribution** Some points with higher Amount Balance values spread horizontally but are typically associated with low New Balance Origin values.')
301
 
302
  st.divider()
303
 
304
  # Multivariate analysis
305
  st.header('Multivariate Analysis')
306
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
307
  if __name__ == '__main__':
308
  app()
 
295
  plt.title('Amount Balance vs New Balance Origin')
296
  st.pyplot(fig)
297
  st.write('The scatter plot shows the relationship between New Balance Origin and Amount Balance. Similar to the previous plot, the data points highlight how most transactions cluster around lower values for both balances.')
298
+ st.markdown('- **High Density at Lower Values**: The majority of the data points are concentrated on the origin (0,0), indicating that most transactions involve smaller amounts for both New Balance Origin and Amount Balance.')
299
+ st.markdown('- **Vertical Distribution**: There are points with higher New Balance Origin spread vertically, mostly associated with lower Amount Balance values.')
300
+ st.markdown('- **Horizontal Distribution**: Some points with higher Amount Balance values spread horizontally but are typically associated with low New Balance Origin values.')
301
 
302
  st.divider()
303
 
304
  # Multivariate analysis
305
  st.header('Multivariate Analysis')
306
+
307
+ # heatmap to visualize relationships
308
+ st.subheader('Heatmap of Correlation between Numeric Variables')
309
+ with st.spinner('Loading...'):
310
+ correlation_matrix = data[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']].corr()
311
+ fig, ax = plt.subplots(figsize=(10, 8))
312
+ sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5, fmt=".2f", ax=ax)
313
+ st.pyplot(fig)
314
+ st.write("""
315
+ The heatmap provides a visual representation of the correlation matrix for the numeric variables: `amount`, `oldbalanceOrg`, `newbalanceOrig`, `oldbalanceDest`, and `newbalanceDest`.
316
+
317
+ 1. **Strong Correlations:**
318
+ - There is a perfect correlation (1.00) between `oldbalanceOrg` and `newbalanceOrig`, indicating that the balance in the origin account before and after the transaction are almost always identical.
319
+ - Similarly, `oldbalanceDest` and `newbalanceDest` have a very high correlation (0.98), showing that the balance in the destination account before and after the transaction is very closely related.
320
+
321
+ 2. **Moderate Correlations:**
322
+ - `amount` shows moderate correlations with `oldbalanceDest` (0.29) and `newbalanceDest` (0.46). This indicates that the transaction amount has a moderate positive relationship with the balances in the destination account.
323
+
324
+ 3. **Weak or No Correlations:**
325
+ - `amount` has very weak or no correlation with `oldbalanceOrg` (-0.00) and `newbalanceOrig` (-0.01), suggesting that the transaction amount is not significantly related to the balances in the origin account.
326
+ - Other correlations, such as between `oldbalanceOrg` and `oldbalanceDest` (0.07), are also weak, indicating minimal linear relationships between these variables.
327
+ """)
328
+
329
+ # Pairplot to visualize relationships
330
+ st.subheader('Pairplot of Numeric Variables')
331
+ with st.spinner('Loading...'):
332
+ fig = sns.pairplot(df[['amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']])
333
+ st.pyplot(fig)
334
+ st.write('The pair plot provides a detailed view of the relationships between the numeric variables: `amount`, `oldbalanceOrg`, `newbalanceOrig`, `oldbalanceDest`, and `newbalanceDest`.')
335
+ st.markdown('- **Strong Linear Relationships**: There are clear linear relationships between `oldbalanceOrg` and `newbalanceOrig`, as well as between `oldbalanceDest` and `newbalanceDest`. This indicates that the balance before and after transactions are highly correlated.')
336
+ st.markdown('- **Clustered Data Points**: Most data points are clustered near the lower end of the scales, especially for `amount` and `balances`, suggesting a high frequency of small-value transactions.')
337
+ st.markdown('- **Diagonal Lines**: The diagonal subplots show histograms of each variable, reflecting the distribution of individual variables. ')
338
+ st.markdown('- **Scattered Points**: There are noticeable outliers and scattered points in the relationships between `amount` and the balance variables, indicating some transactions involve significantly higher amounts than the majority.')
339
+
340
+
341
  if __name__ == '__main__':
342
  app()