hathawayj commited on
Commit
d3b39b6
1 Parent(s): b609185

ismall edits

Browse files
Files changed (5) hide show
  1. README.md +15 -75
  2. challenge.md +14 -0
  3. remark_slides.md +2 -7
  4. requirements.txt +1 -2
  5. scripts_build/01_read.py +4 -2
README.md CHANGED
@@ -10,98 +10,38 @@ app_port: 8501
10
  ---
11
 
12
 
13
- ## Introduction to Data Science with Python
14
 
15
  ## Overview
16
 
17
- Location: Accra, Ghana When: July 31 and August 1, 2023
18
 
19
- This material focuses on [Polars](https://pola-rs.github.io/polars-book/user-guide/), [Parquet files](https://parquet.apache.org/docs/), [Plotly Express](https://plotly.com/python/plotly-express/), and [Streamlit](https://streamlit.io/) to introduce the data science process.
20
 
21
  ## Installing the tools
22
 
23
  We will need [Visual Studio Code](https://code.visualstudio.com/download) and [Python](https://www.python.org/downloads/) installed for this short course. Each tool has additional packages/extensions that we will need to install as well.
24
 
25
-
26
  ### Visual Studio Code Extensions
27
 
28
- You can use [Managing Extensions in Visual Studio Code](https://code.visualstudio.com/docs/editor/extension-marketplace) to learn about how to install extensions. We will use [Python - Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-python.python) extension heavily. [Managing Extensions in Visual Studio Code](https://code.visualstudio.com/docs/editor/extension-marketplace) provides more background on extensions if needed.
29
-
30
- #### VS Code Interactive Python Window
31
-
32
- An open-source project called [Jupyter](http://jupyter-notebook.readthedocs.io/en/latest/) is the standard method for interactive Python use for data science or scientific computing. However, there are [some issues](https://towardsdatascience.com/5-reasons-why-jupyter-notebooks-suck-4dc201e27086) with its use in a development environment. VS Code provides a way for us to have the best of Python and Jupyter Notebooks with their [Python Interactive Window](https://code.visualstudio.com/docs/python/jupyter-support-py).
33
-
34
- VS Code is fairly intelligent in responding to your needs. If you open a `.py` file it should ask pop up a window asking you if you would like prepare your Python experience. You will need to install the [jupyter python package](https://jupyter.readthedocs.io/en/latest/install.html). If VS Code doesn't install it it, you can use `pip` or `pip3` for the interactive Python window to work.
35
-
36
- Using the VS Code functionality, you will work with a standard `.py` file instead of the `.ipynb` extension typically used with jupyter notebooks. The Python extension in VS Code will recognize `# %%` as a cell or chunk of python code and add notebook options to ‘Run Cell’ as well as other actions. You can see the code example bellow with the image of the view in VS Code as an example. [Microsoft’s documentation](https://code.visualstudio.com/docs/python/jupyter-support-py) goes into more detail (https://code.visualstudio.com/docs/python/jupyter-support-py).
37
-
38
- To make the interactive window use more functional you can `ctrl + ,` or `cmd + ,` on a mac to open the settings. From there you can search **‘Send Selection to Interactive Window’** and make sure the box is checked. Now you will be able to use `shift + return` to send a selected chunk of code or an entire cell.
39
-
40
- ```python
41
- # %%
42
- msg = "Hello World"
43
- print(msg)
44
-
45
- # %%
46
- msg = "Hello again"
47
- print(msg)
48
- ```
49
-
50
- ![img](img/vscode-code-cells-01.png)
51
-
52
- ### Python Packages
53
-
54
- #### `pip` overview
55
-
56
- *The standard command* - `pip install polars[all] plotly streamlit` is executed in your Terminal, Command Window, or by using the `New Terminal` under `Terminal` in VS Code. If you are using a Mac you most likely will use `pip3 install polars[all] plotly streamlit`. In your interactive Python environment in VS Code (Jupyter server) you can run `!pip install polars[all] plotly streamlit` as explained [here](https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/#How-to-use-Pip-from-the-Jupyter-Notebook). Finally, you could use the following Python code snippet.
57
-
58
- The two commands that can be used in the interactive python window in VS Code to install packages.
59
-
60
- ```python
61
- !pip install polars[all] plotly streamlit
62
- ```
63
-
64
- or
65
-
66
- ```python
67
- import sys
68
- !{sys.executable} -m pip install polars[all] plotly streamlit
69
- ```
70
-
71
- #### `pip` commands
72
-
73
- - `pip install polars[all] plotly streamlit` should install all needed packages.
74
-
75
- You could install them individually using the following commands.
76
-
77
- - `pip install polars[all]` for [Polars](https://pola-rs.github.io/polars-book/user-guide/installation/)
78
- - `pip install streamlit` for [Streamlit](https://docs.streamlit.io/library/get-started/main-concepts)
79
- - `pip install plotly` for [plotly in Python](https://plotly.com/python/getting-started/)
80
 
81
  ## Repo Navigation
82
 
83
  ### `guides` folder
84
 
85
- The `guides` folder will allow us to explore these packages if the internet connection is down during our course.
86
-
87
- - PDF Files: The pdf files should have most of the commands we will need during the course. The `polars_website.pdf` is a full pdf build of their website guide as of July 2023.
88
- - `streamlit_md` folder: This folder has the markdown files used to build their website guide. It is a little harder to navigate.
89
- - `polars_site` folder: This folder has the fully built website for the polars package as of July 2023. From your OS file explorer open the `index.html` file to see the full site.
90
-
91
- ### `data` folder
92
-
93
- This folder has the data we will be using for the short course. Read more about [the data folder](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/data/readme.md).
94
-
95
- ### Scripts folder
96
-
97
- The scripts folder has the starting scripts for each of the activities we will complete during the short course.
98
 
99
- ### Markdown links
100
 
101
- - [plotly.md](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/plotly.md): links to the primary functions we will use as we create charts with Plotly Express
102
- - [polars.md](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/polars.md): links to the key methods we will leverage for data import and munging.
103
- - [streamlit.md](https://file+.vscode-resource.vscode-cdn.net/Users/hathawayj/git/hathawayj/ghana_datascience/streamlit.md): links to the dashboard functions and concepts we will use with Streamlit
104
 
105
- ## Slides
106
 
107
- The [HTML Slides](https://hathawayj.github.io/ghana_datascience/) and [pdf slides](https://github.com/hathawayj/ghana_datascience/blob/slides/slides.pdf)
 
 
 
 
 
 
 
10
  ---
11
 
12
 
13
+ ## Introduction to Streamlit with Docker
14
 
15
  ## Overview
16
 
17
+ Location: Rexburg, Idaho When: July 16, 2024
18
 
19
+ This material uses [Polars](https://pola-rs.github.io/polars-book/user-guide/) and focuses [Streamlit](https://streamlit.io/) and dashboarding to introduce the data science app development process.
20
 
21
  ## Installing the tools
22
 
23
  We will need [Visual Studio Code](https://code.visualstudio.com/download) and [Python](https://www.python.org/downloads/) installed for this short course. Each tool has additional packages/extensions that we will need to install as well.
24
 
 
25
  ### Visual Studio Code Extensions
26
 
27
+ You can use [Managing Extensions in Visual Studio Code](https://code.visualstudio.com/docs/editor/extension-marketplace) to learn how to install extensions. We will use [Python - Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-python.python) extension heavily. [Managing Extensions in Visual Studio Code](https://code.visualstudio.com/docs/editor/extension-marketplace) provides more background on extensions if needed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Repo Navigation
30
 
31
  ### `guides` folder
32
 
33
+ The `guides` folder has cheat sheets for polars and streamlit
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ ### `scripts_build` folder
36
 
37
+ The `scripts_build` folder has the munging scripts that built the data for the app we will explore.
 
 
38
 
39
+ ### Other key files
40
 
41
+ - The [slides.html](slides.html) is a Remark Slides presentation on Dashboarding. You can read more at [remark_slides.md](remark_slides.md). The slides are embedded in the default Streamlit app for this repository.
42
+ - [Dockerfile](Dockerfile) is the build script for our Docker Image
43
+ - [docker-compose.yml](docker-compose.yml) provides an easy way to start our docker container. [Docker Compose](https://docs.docker.com/compose/#:~:text=It%20is%20the%20key%20to,single%2C%20comprehensible%20YAML%20configuration%20file.) is _'the key to unlocking a streamlined and efficient development and deployment experience.'_
44
+ - [requirements.txt](requirements.txt) is run from the [Dockerfile](Dockerfile) and installs the needed Python packages.
45
+ - [README.md](README.md) is this file. The `YAML` at the top is necessary for the Streamlit app to work correctly. Specifically the `app_port: 8501` is needed. All other information can and should be manipulated.
46
+ - [streamlit.py] is our Streamlit app.
47
+ - The remaining files are data files.
challenge.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Learning Challenge
2
+
3
+ - Add the ability to filter the chart to a specified year range with [st.date_input()](https://docs.streamlit.io/develop/api-reference/widgets/st.date_input)
4
+ - Add [Dataframes - st.data_editor()](https://docs.streamlit.io/develop/concepts/design/dataframes) to allow the user to pick which variables are displayed in the drop down.
5
+ - Add a few metrics to your dashboard using [st.metric()](https://docs.streamlit.io/develop/api-reference/data/st.metric)
6
+ - Report the year range of data available for the variable selected over all countries
7
+ - Add the percent growth from 2000 to the latest available year
8
+ - Add the country with the highest value in the latest year.
9
+ - Give the user of your app the ability to take a picture using [st.camera_input()](https://docs.streamlit.io/develop/api-reference/widgets/st.camera_input).
10
+ - Try to use a third party extension to allow the user to draw on the camera picture taken using [streamlit-drawable-canvas](https://github.com/andfanilo/streamlit-drawable-canvas?tab=readme-ov-file).
11
+ - Now organize your application using
12
+ - [st.set_page_config()](https://docs.streamlit.io/develop/api-reference/configuration/st.set_page_config)
13
+ - [st.columns()](https://docs.streamlit.io/develop/api-reference/layout/st.columns)
14
+
remark_slides.md CHANGED
@@ -3,13 +3,8 @@
3
  This template is made from [Remark](https://github.com/gnab/remark), an open source tool to help create and display slideshows from markdown. For questions, see [Remark's documentation](https://github.com/gnab/remark). I have added a Github action to convert the slides to a pdf in the `slides` branch.
4
 
5
  The most important things to know are:
6
- - Enable GitHub Pages from `master` for the slides to work
7
- - Once enabled, the slides will be visible at `https://USERNAME.github.io/REPOSITORY-NAME/#1`, like https://brianamarie.github.io/slideshow-on-pages/#1
8
- - Edit the `index.html` file to edit the slides
9
  - Slides are separated by `----`
10
  - Presenter notes after `???` within one slide
11
  - Toggle presenter notes during presentation with `P`
12
- - Read the full guide to [remark markdown](https://github.com/gnab/remark/wiki)
13
- - Press `C` to clone a display; then press `P` to switch to presenter mode. Open help menu with `h`
14
-
15
- Fork this repository to get started!
 
3
  This template is made from [Remark](https://github.com/gnab/remark), an open source tool to help create and display slideshows from markdown. For questions, see [Remark's documentation](https://github.com/gnab/remark). I have added a Github action to convert the slides to a pdf in the `slides` branch.
4
 
5
  The most important things to know are:
6
+ - Edit the `slides.html` file to edit the slides
 
 
7
  - Slides are separated by `----`
8
  - Presenter notes after `???` within one slide
9
  - Toggle presenter notes during presentation with `P`
10
+ - Read the full guide to [remark markdown](https://github.com/gnab/remark/wiki)
 
 
 
requirements.txt CHANGED
@@ -4,5 +4,4 @@ pandas
4
  streamlit
5
  scikit-learn
6
  numpy
7
- plotly
8
- lets-plot
 
4
  streamlit
5
  scikit-learn
6
  numpy
7
+ plotly
 
scripts_build/01_read.py CHANGED
@@ -2,7 +2,7 @@
2
  import polars as pl
3
  # Notice that the world health leaves missing as blanks in the csv. We need to explain that blanks aren't strings but missing values.
4
 
5
- dat = pl.read_csv("../data/API_Download_DS2_en_csv_v2_5657328.csv", skip_rows=4, null_values = "")
6
  # We don't like the World Banks wide format. Let's clean it upt to long format.
7
  dat_long = dat.melt(id_vars=["Country Name", "Country Code", "Indicator Name", "Indicator Code"])
8
  # no we need to fix the year column and give it a better name.
@@ -44,6 +44,8 @@ dat_final = dat_final.select(name_order)
44
 
45
  # %%
46
  # write data
47
- dat_final.write_parquet("../data/dat_munged.parquet")
48
 
49
 
 
 
 
2
  import polars as pl
3
  # Notice that the world health leaves missing as blanks in the csv. We need to explain that blanks aren't strings but missing values.
4
 
5
+ dat = pl.read_csv("../API_Download_DS2_en_csv_v2_5657328.csv", skip_rows=4, null_values = "")
6
  # We don't like the World Banks wide format. Let's clean it upt to long format.
7
  dat_long = dat.melt(id_vars=["Country Name", "Country Code", "Indicator Name", "Indicator Code"])
8
  # no we need to fix the year column and give it a better name.
 
44
 
45
  # %%
46
  # write data
47
+ dat_final.write_csv("../dat_munged.csv")
48
 
49
 
50
+
51
+ # %%