{ "cells": [ { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import os\n", "from datetime import datetime, timedelta\n", "import numpy as np\n", "from tqdm import tqdm\n", "import calendar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this project, we needed to get two datasets from trusted sources. For the first dataset, we downloaded data from 2021 - 2023 from the US DoT's [On-Time : Reporting Carrier On-Time Performance (1987-present)](https://transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGJ&QO_fu146_anzr=b0-gvzr) dataset for the time preformance of every sceduled flight past three years. The second dataset was from the ASOS Network [ASOS-AWOS-METAR Data Download](https://www.mesonet.agron.iastate.edu/request/download.phtml?network=VA_ASOS) to gather hourly weather data using the 30 most popular airports in the US, according to the FAA based on this doctument https://www.faa.gov/sites/faa.gov/files/2022-09/cy21-commercial-service-enplanements.pdf ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Airport Data Cleaning\n", "\n", "To get the data from the US DoT, we needed to download every month individually. Requesting custom data was slow, so the full dataset was downloaded for each month, then cleaned. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define what columns should be kept for our use case and the folder names we are using" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "kept_cols = ['Year', 'Month', 'DayofMonth', 'Reporting_Airline', 'Origin', 'Dest', \n", "'CRSDepTime', 'DepDelayMinutes', 'Cancelled', 'CancellationCode',\n", "'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay', ]\n", "\n", "raw_flight_data_folder = 'flight_data_raw'\n", "cleaned_flight_data_folder = 'flight_data'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(raw_flight_data_folder):\n", " for database in os.listdir(os.path.join(raw_flight_data_folder, year)):\n", " pd.read_csv(os.path.join(raw_flight_data_folder, year, database)).filter(kept_cols).to_csv(os.path.join(cleaned_flight_data_folder, year, database))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Time in the raw dataset is separated by year, month, day, and hour, so we need to combine these columns into a single datetime column. Furthermore, we round to the nearest hour to match the weather data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def calculate_time(entry):\n", " time = entry['CRSDepTime']\n", "\n", " time_str = f\"{time:04d}\"\n", "\n", " time_str = time_str[:2] + \":\" + time_str[2:]\n", "\n", " date_str = f\"{entry['Year']}-{entry['Month']}-{entry['DayofMonth']}\"\n", "\n", " time = datetime.strptime(date_str + \" \" + time_str, \"%Y-%m-%d %H:%M\")\n", "\n", " time = time.replace(second=0, microsecond=0, minute=0, hour=time.hour) + timedelta(hours=time.minute//30)\n", "\n", " return(time)\n" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "flight_data_with_time = \"flight_data_with_time\"\n", "date_columns_to_drop = ['Year', 'Month', 'DayofMonth', 'CRSDepTime']\n", "auto_generated_columns = ['Unnamed: 0', 'Unnamed: 0.1']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(cleaned_flight_data_folder):\n", " for month_index, month in enumerate(os.listdir(os.path.join(cleaned_flight_data_folder, year))):\n", " #print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(cleaned_flight_data_folder, year, month))\n", " \n", " database['Time'] = database.apply(calculate_time, axis=1)\n", "\n", " database.drop(columns=date_columns_to_drop, inplace=True)\n", " database.to_csv(os.path.join(flight_data_with_time, year, month))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " #print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " try:\n", " database.drop(columns=auto_generated_columns, inplace=True)\n", " except:\n", " pass\n", " database.to_csv(os.path.join(flight_data_with_time, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We should create a simplified version of our targets, so instead of predicting a number lets predict if the flight was delayed or canceled, we can predict true or false. We will define a flight as delayed if it was more than 1 minute late." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " #print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " \n", " database['Cancelled'] = database['Cancelled'].astype(bool)\n", "\n", " database.to_csv(os.path.join(flight_data_with_time, year, month), index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " #print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " \n", " database['Delayed'] = database['DepDelayMinutes'] > 0\n", "\n", " database.to_csv(os.path.join(flight_data_with_time, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cancellation are an internal code that we will need to decode to understand the reason for the cancellation. We will create a new column with the reason for the cancellation. Cancellation codes are defined in the US DoT's document above under the \"Get Lookup Table\" link for \"CancellationCode\"." ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'A': 'Carrier', 'B': 'Weather', 'C': 'National Air System', 'D': 'Security'}\n" ] } ], "source": [ "cancellation_table_df = pd.read_csv('L_CANCELLATION.csv')\n", "cancellation_table = cancellation_table_df.set_index('Code')['Description'].to_dict()\n", "print(cancellation_table)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " \n", " database['CancellationReason'] = database['CancellationCode'].map(cancellation_table)\n", "\n", " database.to_csv(os.path.join(flight_data_with_time, year, month), index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " \n", " database.drop('CancellationCode', axis=1, inplace=True)\n", "\n", " database.to_csv(os.path.join(flight_data_with_time, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will also need to decode the carrier code to get the carrier name. Carrier codes are defined in the US DoT's document above under the \"Get Lookup Table\" link for \"Reporting_Airline\"." ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "carriers_table_df = pd.read_csv('L_UNIQUE_CARRIERS.csv')\n", "carriers_table = carriers_table_df.set_index('Code')['Description'].to_dict()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " \n", " database['Carrier'] = database['Reporting_Airline'].map(carriers_table)\n", " database.drop('Reporting_Airline', axis=1, inplace=True)\n", "\n", " database.to_csv(os.path.join(flight_data_with_time, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's order the columns to make logical sense." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "airport_columns_order = ['Time', 'Origin', 'Dest', 'Carrier', 'Cancelled', 'CancellationReason', 'Delayed', 'DepDelayMinutes', \n", " 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " print(f\"Year: {year}, Month: {month_index}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " \n", " database = database[airport_columns_order]\n", "\n", " database.to_csv(os.path.join(flight_data_with_time, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Weather Cleaning\n", "\n", "To get the weather data, we selected the 30 most popular airports in the US. To get the data, go to the link above, select the state that the airport is in, and add the station with the same code as the airport. For example, if you want to get the weather data for the Atlanta airport, you would select Georgia and add the station starting with the coe \"\\[ATL\\]\". Once again, selecting all the data was more efficient than selecting custom data, then the date range was set from Jan 1st 2021 to Dec 31st 2023. The appropriate timezone for the airport was selected, and the data was downloaded to the computer." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define names of folders we are using" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "weather_data = 'weather_data'\n", "weather_raw = 'weather_raw'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The names of this data are not very clear, so we will need to rename the columns to make them more understandable. We only rename the columns that are relevant to our project, so we will drop the rest later." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "new_names = {'station': 'Origin', 'tmpf': 'Temperature', 'sknt': 'Wind_Speed', 'p01i': 'Precipitation', \n", " 'alti': 'Altimeter_Pressure', 'mslp': 'Sea_Level_Pressure', 'vsby': 'Visibility', 'gust': 'Wind_Gust', \n", " 'feel': 'Feels_Like_Temperature', 'ice_accretion_3hr': 'Ice_Accretion_3hr', 'snowdepth': 'Snow_Depth'}" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\652326846.py:2: DtypeWarning: Columns (8) have mixed types. Specify dtype option on import or set low_memory=False.\n", " data_frame = pd.read_csv(os.path.join(weather_raw, station))\n", "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\652326846.py:2: DtypeWarning: Columns (2,3,4,5,6,7,8,10,27) have mixed types. Specify dtype option on import or set low_memory=False.\n", " data_frame = pd.read_csv(os.path.join(weather_raw, station))\n", "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\652326846.py:2: DtypeWarning: Columns (2,3,4,6,8,10,27) have mixed types. Specify dtype option on import or set low_memory=False.\n", " data_frame = pd.read_csv(os.path.join(weather_raw, station))\n" ] } ], "source": [ "for station in os.listdir(weather_raw):\n", " data_frame = pd.read_csv(os.path.join(weather_raw, station))\n", "\n", " data_frame = data_frame.rename(columns=new_names)\n", "\n", " data_frame.to_csv(os.path.join(weather_data, station), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Time is not properly a datetime object, so we will need to convert it to a datetime object. We will also round the time to the nearest hour to match the flight data." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "def calculate_time_weather(entry):\n", " valid = entry['valid']\n", "\n", " time = datetime.strptime(valid, \"%Y-%m-%d %H:%M\")\n", "\n", " time = time.replace(second=0, microsecond=0, minute=0, hour=time.hour) + timedelta(hours=time.minute//30)\n", "\n", " return(time)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Station: ATL.csv\n", "Station: AUS.csv\n", "Station: BNA.csv\n", "Station: BOS.csv\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\174780951.py:3: DtypeWarning: Columns (8) have mixed types. Specify dtype option on import or set low_memory=False.\n", " data_frame = pd.read_csv(os.path.join(weather_data, station))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Station: BWI.csv\n", "Station: CLT.csv\n", "Station: DCA.csv\n", "Station: DEN.csv\n", "Station: DFW.csv\n", "Station: DTW.csv\n", "Station: EWR.csv\n", "Station: FLL.csv\n", "Station: IAD.csv\n", "Station: IAH.csv\n", "Station: JFK.csv\n", "Station: LAS.csv\n", "Station: LAX.csv\n", "Station: LGA.csv\n", "Station: MCO.csv\n", "Station: MDW.csv\n", "Station: MIA.csv\n", "Station: MSP.csv\n", "Station: ORD.csv\n", "Station: PHL.csv\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\174780951.py:3: DtypeWarning: Columns (2,3,4,5,6,7,8,10,27) have mixed types. Specify dtype option on import or set low_memory=False.\n", " data_frame = pd.read_csv(os.path.join(weather_data, station))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Station: PHX.csv\n", "Station: SAN.csv\n", "Station: SEA.csv\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\174780951.py:3: DtypeWarning: Columns (2,3,4,6,8,10,27) have mixed types. Specify dtype option on import or set low_memory=False.\n", " data_frame = pd.read_csv(os.path.join(weather_data, station))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Station: SFO.csv\n", "Station: SLC.csv\n", "Station: TPA.csv\n" ] } ], "source": [ "for station in os.listdir(weather_data):\n", " print(f\"Station: {station}\")\n", " data_frame = pd.read_csv(os.path.join(weather_data, station))\n", " \n", " data_frame['Time'] = data_frame.apply(calculate_time_weather, axis=1)\n", "\n", " data_frame.to_csv(os.path.join(weather_data, station), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filter out the columns that are not relevant to our project." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "weather_saved_columns = ['Origin', 'Time', 'Temperature', 'Feels_Like_Temperature', 'Sea_Level_Pressure', 'Altimeter_Pressure', 'Visibility', 'Wind_Speed', 'Wind_Gust', 'Precipitation', 'Ice_Accretion_3hr']" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\318951639.py:2: DtypeWarning: Columns (8) have mixed types. Specify dtype option on import or set low_memory=False.\n", " pd.read_csv(os.path.join(weather_data, station)).filter(weather_saved_columns).to_csv(os.path.join(weather_data, station), index=False)\n", "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\318951639.py:2: DtypeWarning: Columns (2,3,4,5,6,7,8,10,27) have mixed types. Specify dtype option on import or set low_memory=False.\n", " pd.read_csv(os.path.join(weather_data, station)).filter(weather_saved_columns).to_csv(os.path.join(weather_data, station), index=False)\n", "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\318951639.py:2: DtypeWarning: Columns (2,3,4,6,8,10,27) have mixed types. Specify dtype option on import or set low_memory=False.\n", " pd.read_csv(os.path.join(weather_data, station)).filter(weather_saved_columns).to_csv(os.path.join(weather_data, station), index=False)\n" ] } ], "source": [ "for station in os.listdir(weather_data):\n", " pd.read_csv(os.path.join(weather_data, station)).filter(weather_saved_columns).to_csv(os.path.join(weather_data, station), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dataset has missing columns represented by \"M\" and if there was a small amount that could not be quantified it became a \"T\". We will average most of the missing values. Ice accretion was always missing if it was not present, so we will fill it with 0." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "missing_average_value = ['Temperature', 'Wind_Speed', 'Precipitation', 'Altimeter_Pressure', 'Sea_Level_Pressure', 'Visibility', 'Wind_Gust', 'Feels_Like_Temperature']\n", "\n", "missing_zero_value = ['Ice_Accretion_3hr']" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Station: ATL.csv\n", "Station: AUS.csv\n", "Station: BNA.csv\n", "Station: BOS.csv\n", "Station: BWI.csv\n", "Station: CLT.csv\n", "Station: DCA.csv\n", "Station: DEN.csv\n", "Station: DFW.csv\n", "Station: DTW.csv\n", "Station: EWR.csv\n", "Station: FLL.csv\n", "Station: IAD.csv\n", "Station: IAH.csv\n", "Station: JFK.csv\n", "Station: LAS.csv\n", "Station: LAX.csv\n", "Station: LGA.csv\n", "Station: MCO.csv\n", "Station: MDW.csv\n", "Station: MIA.csv\n", "Station: MSP.csv\n", "Station: ORD.csv\n", "Station: PHL.csv\n", "Station: PHX.csv\n", "Station: SAN.csv\n", "Station: SEA.csv\n", "Station: SFO.csv\n", "Station: SLC.csv\n", "Station: TPA.csv\n" ] } ], "source": [ "for station in os.listdir(weather_data):\n", " print(f\"Station: {station}\")\n", " data_frame = pd.read_csv(os.path.join(weather_data, station))\n", "\n", " for column in missing_average_value:\n", " if data_frame[column].dtype == 'O':\n", " data_frame[column] = data_frame[column].replace('M', np.NaN)\n", " data_frame[column] = data_frame[column].replace('T', '0.001')\n", " data_frame[column] = pd.to_numeric(data_frame[column], errors='coerce')\n", " average_value = round(data_frame[column].mean(), 1)\n", " data_frame[column].fillna(average_value, inplace=True)\n", "\n", " for column in missing_zero_value:\n", " if data_frame[column].dtype == 'O':\n", " data_frame[column] = data_frame[column].replace('M', '0.00')\n", " data_frame[column] = data_frame[column].replace('T', '0.001')\n", " data_frame[column] = pd.to_numeric(data_frame[column], errors='coerce')\n", "\n", " data_frame.to_csv(os.path.join(weather_data, station), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reorder the columns to make logical sense." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "weather_columns_order = ['Time', 'Origin', 'Temperature', 'Feels_Like_Temperature', 'Altimeter_Pressure', 'Sea_Level_Pressure', 'Visibility', 'Wind_Speed', \n", " 'Wind_Gust', 'Precipitation', 'Ice_Accretion_3hr']" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "for station in os.listdir(weather_data):\n", " data_frame = pd.read_csv(os.path.join(weather_data, station))\n", " \n", " data_frame = data_frame[weather_columns_order]\n", "\n", " data_frame.to_csv(os.path.join(weather_data, station), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It will be a lot easier if we combine all of the data into a single dataset for processing. We will simply concatenate all the data into a single dataset. This dataset is smaller then combining the flight data with the weather data, so we will loop over the flight data and keep this as a variable for processing." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "weather_data_frame = pd.DataFrame()\n", "for station in os.listdir(weather_data):\n", " airport_weather_df = pd.read_csv(os.path.join(weather_data, station))\n", " \n", " weather_data_frame = pd.concat([weather_data_frame, airport_weather_df])\n", " \n", "weather_data_frame.to_csv(\"combined_weather.csv\", index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert to units of measurement for the OpenWeather API. Users will input where they are leaving from and we will hit OpenWeather API to get the weather forecast data for that location. It will be easier to units of measurement now then to convert every time we use the API." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "def miles_to_meters(miles):\n", " meters = round(miles * 1609.34, 2)\n", " return meters\n", "\n", "def knots_to_mph(knots):\n", " mph = round(knots * 1.15078, 2)\n", " return mph\n", "\n", "def inches_to_mm(inches):\n", " mm = round(inches * 25.4, 2)\n", " return mm" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "combined_weather = pd.read_csv(\"combined_weather.csv\")\n", "\n", "combined_weather['Visibility'] = combined_weather['Visibility'].apply(miles_to_meters)\n", "combined_weather['Wind_Speed'] = combined_weather['Wind_Speed'].apply(knots_to_mph)\n", "combined_weather['Wind_Gust'] = combined_weather['Wind_Gust'].apply(knots_to_mph)\n", "combined_weather['Precipitation'] = combined_weather['Precipitation'].apply(inches_to_mm)\n", "combined_weather['Ice_Accretion_3hr'] = combined_weather['Ice_Accretion_3hr'].apply(inches_to_mm)\n", "\n", "combined_weather.to_csv(\"combined_weather.csv\", index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Combine Data\n", "\n", "Now that everything is properly formatted, we need to append the appropriate weather data to the flight data. We will loop over the flight data and append the weather data for the appropriate airport and time." ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [], "source": [ "relevant_airport_codes = [\"ATL\", \"DFW\", \"DEN\", \"ORD\", \"LAX\", \"JFK\", \"LAS\", \"MCO\", \"MIA\", \"CLT\", \"SEA\", \"PHX\", \"EWR\", \"SFO\", \"IAH\", \"BOS\", \"FLL\", \"MSP\", \"LGA\", \"DTW\", \"PHL\", \"SLC\", \"DCA\", \"SAN\", \"BWI\", \"TPA\", \"AUS\", \"IAD\", \"BNA\", \"MDW\"]\n", "\n", "relevant_airport_data = \"relevant_airport_data\"\n", "\n", "combined_weather = pd.read_csv(\"combined_weather.csv\")\n", "\n", "relevant_airlines = ['Endeavor Air Inc.', 'American Airlines Inc.', 'Alaska Airlines Inc.',\n", " 'JetBlue Airways', 'Delta Air Lines Inc.', 'Frontier Airlines Inc.',\n", " 'Allegiant Air', 'Hawaiian Airlines Inc.', 'Envoy Air', 'Spirit Air Lines',\n", " 'PSA Airlines Inc.', 'SkyWest Airlines Inc.', 'Horizon Air',\n", " 'United Air Lines Inc.', 'Southwest Airlines Co.', 'Mesa Airlines Inc.',\n", " 'Republic Airline']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We only care about the 30 largest airports, as they make up over 70% of all air traffic, but every airport is still in the airport dataset. Remove the rest of the airports in the dataset. " ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Year: 2021, Month: 1\n", "Year: 2021, Month: 2\n", "Year: 2021, Month: 3\n", "Year: 2021, Month: 4\n", "Year: 2021, Month: 5\n", "Year: 2021, Month: 6\n", "Year: 2021, Month: 7\n", "Year: 2021, Month: 8\n", "Year: 2021, Month: 9\n", "Year: 2021, Month: 10\n", "Year: 2021, Month: 11\n", "Year: 2021, Month: 12\n", "Year: 2022, Month: 1\n", "Year: 2022, Month: 2\n", "Year: 2022, Month: 3\n", "Year: 2022, Month: 4\n", "Year: 2022, Month: 5\n", "Year: 2022, Month: 6\n", "Year: 2022, Month: 7\n", "Year: 2022, Month: 8\n", "Year: 2022, Month: 9\n", "Year: 2022, Month: 10\n", "Year: 2022, Month: 11\n", "Year: 2022, Month: 12\n", "Year: 2023, Month: 1\n", "Year: 2023, Month: 2\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\910276774.py:4: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Year: 2023, Month: 3\n", "Year: 2023, Month: 4\n", "Year: 2023, Month: 5\n", "Year: 2023, Month: 6\n", "Year: 2023, Month: 7\n", "Year: 2023, Month: 8\n", "Year: 2023, Month: 9\n", "Year: 2023, Month: 10\n", "Year: 2023, Month: 11\n", "Year: 2023, Month: 12\n" ] } ], "source": [ "for year in os.listdir(flight_data_with_time):\n", " for month_index, month in enumerate(os.listdir(os.path.join(flight_data_with_time, year))):\n", " print(f\"Year: {year}, Month: {month_index + 1}\")\n", " database = pd.read_csv(os.path.join(flight_data_with_time, year, month))\n", " \n", " database = database[database['Origin'].isin(relevant_airport_codes)]\n", "\n", " database.to_csv(os.path.join(relevant_airport_data, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, now we can combine on the airport code and time. We will use a left join to keep all the flight data and only append the weather data matching the airport code and time." ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Year: 2021, Month: 1\n", "Year: 2021, Month: 2\n", "Year: 2021, Month: 3\n", "Year: 2021, Month: 4\n", "Year: 2021, Month: 5\n", "Year: 2021, Month: 6\n", "Year: 2021, Month: 7\n", "Year: 2021, Month: 8\n", "Year: 2021, Month: 9\n", "Year: 2021, Month: 10\n", "Year: 2021, Month: 11\n", "Year: 2021, Month: 12\n", "Year: 2022, Month: 1\n", "Year: 2022, Month: 2\n", "Year: 2022, Month: 3\n", "Year: 2022, Month: 4\n", "Year: 2022, Month: 5\n", "Year: 2022, Month: 6\n", "Year: 2022, Month: 7\n", "Year: 2022, Month: 8\n", "Year: 2022, Month: 9\n", "Year: 2022, Month: 10\n", "Year: 2022, Month: 11\n", "Year: 2022, Month: 12\n", "Year: 2023, Month: 1\n", "Year: 2023, Month: 2\n", "Year: 2023, Month: 3\n", "Year: 2023, Month: 4\n", "Year: 2023, Month: 5\n", "Year: 2023, Month: 6\n", "Year: 2023, Month: 7\n", "Year: 2023, Month: 8\n", "Year: 2023, Month: 9\n", "Year: 2023, Month: 10\n", "Year: 2023, Month: 11\n", "Year: 2023, Month: 12\n" ] } ], "source": [ "for year in os.listdir(relevant_airport_data):\n", " for month_index, month in enumerate(os.listdir(os.path.join(relevant_airport_data, year))):\n", " print(f\"Year: {year}, Month: {month_index + 1}\")\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", " \n", " database = pd.merge(database, combined_weather, on=['Time', 'Origin'], how='left')\n", "\n", " database.to_csv(os.path.join(relevant_airport_data, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unfortunately, now we have some missing values due to missing hours in the weather dataset" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time 0\n", "Origin 0\n", "Dest 0\n", "Carrier 0\n", "Cancelled 0\n", "CancellationReason 364565\n", "Delayed 0\n", "DepDelayMinutes 2328\n", "CarrierDelay 311019\n", "WeatherDelay 311019\n", "NASDelay 311019\n", "SecurityDelay 311019\n", "LateAircraftDelay 311019\n", "Temperature 123\n", "Feels_Like_Temperature 123\n", "Altimeter_Pressure 123\n", "Sea_Level_Pressure 123\n", "Visibility 123\n", "Wind_Speed 123\n", "Wind_Gust 123\n", "Precipitation 123\n", "Ice_Accretion_3hr 123\n", "dtype: int64\n" ] } ], "source": [ "print(pd.read_csv(\"relevant_airport_data/2021/On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2021_5.csv\").isna().sum())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets just drop values that are missing" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Year: 2021, Month: January\n", "Year: 2021, Month: February\n", "Year: 2021, Month: March\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\2193919038.py:4: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Year: 2021, Month: April\n", "Year: 2021, Month: May\n", "Year: 2021, Month: June\n", "Year: 2021, Month: July\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\2193919038.py:4: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Year: 2021, Month: August\n", "Year: 2021, Month: September\n", "Year: 2021, Month: October\n", "Year: 2021, Month: November\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\2193919038.py:4: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Year: 2021, Month: December\n", "Year: 2022, Month: January\n", "Year: 2022, Month: February\n", "Year: 2022, Month: March\n", "Year: 2022, Month: April\n", "Year: 2022, Month: May\n", "Year: 2022, Month: June\n", "Year: 2022, Month: July\n", "Year: 2022, Month: August\n", "Year: 2022, Month: September\n", "Year: 2022, Month: October\n", "Year: 2022, Month: November\n", "Year: 2022, Month: December\n", "Year: 2023, Month: January\n", "Year: 2023, Month: February\n", "Year: 2023, Month: March\n", "Year: 2023, Month: April\n", "Year: 2023, Month: May\n", "Year: 2023, Month: June\n", "Year: 2023, Month: July\n", "Year: 2023, Month: August\n", "Year: 2023, Month: September\n", "Year: 2023, Month: October\n", "Year: 2023, Month: November\n", "Year: 2023, Month: December\n" ] } ], "source": [ "for year in os.listdir(relevant_airport_data):\n", " for month_index, month in enumerate(os.listdir(os.path.join(relevant_airport_data, year))):\n", " print(f\"Year: {year}, Month: {calendar.month_name[month_index + 1]}\")\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "\n", " database.dropna(subset=['Temperature'], inplace=True)\n", "\n", " database.to_csv(os.path.join(relevant_airport_data, year, month), index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Separation\n", "\n", "Now that we have all the data points cleaned, we need to separate the data by airport and airline. We want to predict specifically for these variables, so we will separate the data into different datasets for each airport and airline and do training on each dataset." ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "origins_folder = \"origins\"\n", "airlines_folder = \"airlines\"" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Airport: SEA, Year: 2021, Month: October: 34%|███▍ | 370/1080 [02:35<04:57, 2.38it/s]\n", "Airport: ATL, Year: 2021, Month: March: 0%| | 3/1080 [00:01<09:51, 1.82it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: ATL, Year: 2021, Month: April: 0%| | 4/1080 [00:02<11:22, 1.58it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: ATL, Year: 2021, Month: July: 1%| | 7/1080 [00:04<12:14, 1.46it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: ATL, Year: 2021, Month: November: 1%| | 11/1080 [00:07<11:22, 1.57it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DFW, Year: 2021, Month: March: 4%|▎ | 39/1080 [00:42<41:26, 2.39s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DFW, Year: 2021, Month: April: 4%|▎ | 40/1080 [00:42<32:57, 1.90s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DFW, Year: 2021, Month: July: 4%|▍ | 43/1080 [00:44<18:21, 1.06s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DFW, Year: 2021, Month: November: 4%|▍ | 47/1080 [00:47<12:37, 1.36it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DEN, Year: 2021, Month: March: 7%|▋ | 75/1080 [01:21<38:40, 2.31s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DEN, Year: 2021, Month: April: 7%|▋ | 76/1080 [01:21<30:48, 1.84s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DEN, Year: 2021, Month: July: 7%|▋ | 79/1080 [01:24<17:49, 1.07s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DEN, Year: 2021, Month: November: 8%|▊ | 83/1080 [01:26<12:40, 1.31it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: ORD, Year: 2021, Month: March: 10%|█ | 111/1080 [02:01<36:13, 2.24s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: ORD, Year: 2021, Month: April: 10%|█ | 112/1080 [02:01<29:05, 1.80s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: ORD, Year: 2021, Month: July: 11%|█ | 115/1080 [02:04<17:04, 1.06s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: ORD, Year: 2021, Month: November: 11%|█ | 119/1080 [02:06<12:14, 1.31it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAX, Year: 2021, Month: March: 14%|█▎ | 147/1080 [02:41<34:39, 2.23s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAX, Year: 2021, Month: April: 14%|█▎ | 148/1080 [02:42<28:28, 1.83s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAX, Year: 2021, Month: July: 14%|█▍ | 151/1080 [02:44<17:45, 1.15s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAX, Year: 2021, Month: November: 14%|█▍ | 155/1080 [02:47<12:21, 1.25it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: JFK, Year: 2021, Month: March: 17%|█▋ | 183/1080 [03:19<26:37, 1.78s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: JFK, Year: 2021, Month: April: 17%|█▋ | 184/1080 [03:20<21:58, 1.47s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: JFK, Year: 2021, Month: July: 17%|█▋ | 187/1080 [03:22<14:27, 1.03it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: JFK, Year: 2021, Month: November: 18%|█▊ | 191/1080 [03:25<10:34, 1.40it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAS, Year: 2021, Month: March: 20%|██ | 219/1080 [03:54<21:08, 1.47s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAS, Year: 2021, Month: April: 20%|██ | 220/1080 [03:55<18:12, 1.27s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAS, Year: 2021, Month: July: 21%|██ | 223/1080 [03:57<12:36, 1.13it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LAS, Year: 2021, Month: November: 21%|██ | 227/1080 [04:00<09:51, 1.44it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MCO, Year: 2021, Month: March: 24%|██▎ | 255/1080 [04:29<21:44, 1.58s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MCO, Year: 2021, Month: April: 24%|██▎ | 256/1080 [04:30<18:34, 1.35s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MCO, Year: 2021, Month: July: 24%|██▍ | 259/1080 [04:32<12:20, 1.11it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MCO, Year: 2021, Month: November: 24%|██▍ | 263/1080 [04:35<09:37, 1.42it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SEA, Year: 2021, Month: March: 34%|███▎ | 363/1080 [05:04<01:12, 9.92it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SEA, Year: 2021, Month: April: 34%|███▎ | 364/1080 [05:05<01:33, 7.63it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SEA, Year: 2021, Month: July: 34%|███▍ | 367/1080 [05:07<01:55, 6.17it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SEA, Year: 2021, Month: November: 34%|███▍ | 371/1080 [05:10<03:17, 3.58it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHX, Year: 2021, Month: March: 37%|███▋ | 399/1080 [05:41<20:43, 1.83s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHX, Year: 2021, Month: April: 37%|███▋ | 400/1080 [05:42<17:32, 1.55s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHX, Year: 2021, Month: July: 37%|███▋ | 403/1080 [05:45<11:51, 1.05s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHX, Year: 2021, Month: November: 38%|███▊ | 407/1080 [05:48<08:45, 1.28it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: EWR, Year: 2021, Month: March: 40%|████ | 435/1080 [06:19<18:21, 1.71s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: EWR, Year: 2021, Month: April: 40%|████ | 436/1080 [06:20<15:37, 1.46s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: EWR, Year: 2021, Month: July: 41%|████ | 439/1080 [06:22<10:03, 1.06it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: EWR, Year: 2021, Month: November: 41%|████ | 443/1080 [06:25<07:26, 1.43it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SFO, Year: 2021, Month: March: 44%|████▎ | 471/1080 [06:53<14:14, 1.40s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SFO, Year: 2021, Month: April: 44%|████▎ | 472/1080 [06:54<12:12, 1.21s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SFO, Year: 2021, Month: July: 44%|████▍ | 475/1080 [06:56<08:25, 1.20it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SFO, Year: 2021, Month: November: 44%|████▍ | 479/1080 [06:59<07:11, 1.39it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAH, Year: 2021, Month: March: 47%|████▋ | 507/1080 [07:27<13:41, 1.43s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAH, Year: 2021, Month: April: 47%|████▋ | 508/1080 [07:28<11:58, 1.26s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAH, Year: 2021, Month: July: 47%|████▋ | 511/1080 [07:31<08:41, 1.09it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAH, Year: 2021, Month: November: 48%|████▊ | 515/1080 [07:34<06:57, 1.35it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BOS, Year: 2021, Month: March: 50%|█████ | 543/1080 [08:04<13:26, 1.50s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BOS, Year: 2021, Month: April: 50%|█████ | 544/1080 [08:04<11:20, 1.27s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BOS, Year: 2021, Month: July: 51%|█████ | 547/1080 [08:06<07:47, 1.14it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BOS, Year: 2021, Month: November: 51%|█████ | 551/1080 [08:09<06:04, 1.45it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: FLL, Year: 2021, Month: March: 54%|█████▎ | 579/1080 [08:38<11:59, 1.44s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: FLL, Year: 2021, Month: April: 54%|█████▎ | 580/1080 [08:39<10:13, 1.23s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: FLL, Year: 2021, Month: July: 54%|█████▍ | 583/1080 [08:41<07:00, 1.18it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: FLL, Year: 2021, Month: November: 54%|█████▍ | 587/1080 [08:44<05:48, 1.41it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MSP, Year: 2021, Month: March: 57%|█████▋ | 615/1080 [09:12<10:07, 1.31s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MSP, Year: 2021, Month: April: 57%|█████▋ | 616/1080 [09:12<08:59, 1.16s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MSP, Year: 2021, Month: July: 57%|█████▋ | 619/1080 [09:15<06:41, 1.15it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MSP, Year: 2021, Month: November: 58%|█████▊ | 623/1080 [09:17<05:22, 1.42it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LGA, Year: 2021, Month: March: 60%|██████ | 651/1080 [09:45<09:58, 1.40s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LGA, Year: 2021, Month: April: 60%|██████ | 652/1080 [09:46<08:31, 1.20s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LGA, Year: 2021, Month: July: 61%|██████ | 655/1080 [09:48<05:54, 1.20it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: LGA, Year: 2021, Month: November: 61%|██████ | 659/1080 [09:50<04:40, 1.50it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DTW, Year: 2021, Month: March: 64%|██████▎ | 687/1080 [10:18<09:39, 1.47s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DTW, Year: 2021, Month: April: 64%|██████▎ | 688/1080 [10:19<08:11, 1.25s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DTW, Year: 2021, Month: July: 64%|██████▍ | 691/1080 [10:21<05:26, 1.19it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DTW, Year: 2021, Month: November: 64%|██████▍ | 695/1080 [10:23<04:17, 1.50it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHL, Year: 2021, Month: March: 67%|██████▋ | 723/1080 [10:51<08:33, 1.44s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHL, Year: 2021, Month: April: 67%|██████▋ | 724/1080 [10:51<07:18, 1.23s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHL, Year: 2021, Month: July: 67%|██████▋ | 727/1080 [10:53<04:56, 1.19it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: PHL, Year: 2021, Month: November: 68%|██████▊ | 731/1080 [10:56<03:49, 1.52it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SLC, Year: 2021, Month: March: 70%|███████ | 759/1080 [11:22<06:32, 1.22s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SLC, Year: 2021, Month: April: 70%|███████ | 760/1080 [11:22<05:47, 1.09s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SLC, Year: 2021, Month: July: 71%|███████ | 763/1080 [11:24<04:10, 1.26it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SLC, Year: 2021, Month: November: 71%|███████ | 767/1080 [11:27<03:25, 1.52it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DCA, Year: 2021, Month: March: 74%|███████▎ | 795/1080 [11:54<06:18, 1.33s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DCA, Year: 2021, Month: April: 74%|███████▎ | 796/1080 [11:54<05:27, 1.15s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DCA, Year: 2021, Month: July: 74%|███████▍ | 799/1080 [11:56<03:46, 1.24it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: DCA, Year: 2021, Month: November: 74%|███████▍ | 803/1080 [11:59<03:03, 1.51it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SAN, Year: 2021, Month: March: 77%|███████▋ | 831/1080 [12:27<05:51, 1.41s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SAN, Year: 2021, Month: April: 77%|███████▋ | 832/1080 [12:27<05:01, 1.22s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SAN, Year: 2021, Month: July: 77%|███████▋ | 835/1080 [12:29<03:20, 1.22it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: SAN, Year: 2021, Month: November: 78%|███████▊ | 839/1080 [12:32<02:37, 1.53it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BWI, Year: 2021, Month: March: 80%|████████ | 867/1080 [12:58<04:15, 1.20s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BWI, Year: 2021, Month: April: 80%|████████ | 868/1080 [12:59<03:43, 1.05s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BWI, Year: 2021, Month: July: 81%|████████ | 871/1080 [13:01<02:39, 1.31it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BWI, Year: 2021, Month: November: 81%|████████ | 875/1080 [13:04<02:41, 1.27it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: TPA, Year: 2021, Month: March: 84%|████████▎ | 903/1080 [13:30<03:36, 1.22s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: TPA, Year: 2021, Month: April: 84%|████████▎ | 904/1080 [13:31<03:15, 1.11s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: TPA, Year: 2021, Month: July: 84%|████████▍ | 907/1080 [13:33<02:16, 1.27it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: TPA, Year: 2021, Month: November: 84%|████████▍ | 911/1080 [13:36<01:51, 1.52it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: AUS, Year: 2021, Month: March: 87%|████████▋ | 939/1080 [14:01<02:41, 1.14s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: AUS, Year: 2021, Month: April: 87%|████████▋ | 940/1080 [14:02<02:23, 1.02s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: AUS, Year: 2021, Month: July: 87%|████████▋ | 943/1080 [14:04<01:52, 1.22it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: AUS, Year: 2021, Month: November: 88%|████████▊ | 947/1080 [14:07<01:30, 1.47it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAD, Year: 2021, Month: March: 90%|█████████ | 975/1080 [14:32<02:03, 1.17s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAD, Year: 2021, Month: April: 90%|█████████ | 976/1080 [14:33<01:48, 1.04s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAD, Year: 2021, Month: July: 91%|█████████ | 979/1080 [14:35<01:18, 1.28it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: IAD, Year: 2021, Month: November: 91%|█████████ | 983/1080 [14:38<01:02, 1.56it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BNA, Year: 2021, Month: March: 94%|█████████▎| 1011/1080 [15:02<01:10, 1.03s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BNA, Year: 2021, Month: April: 94%|█████████▎| 1012/1080 [15:03<01:04, 1.05it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BNA, Year: 2021, Month: July: 94%|█████████▍| 1015/1080 [15:05<00:48, 1.35it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: BNA, Year: 2021, Month: November: 94%|█████████▍| 1019/1080 [15:07<00:39, 1.53it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MDW, Year: 2021, Month: March: 97%|█████████▋| 1047/1080 [15:34<00:39, 1.19s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MDW, Year: 2021, Month: April: 97%|█████████▋| 1048/1080 [15:35<00:34, 1.06s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MDW, Year: 2021, Month: July: 97%|█████████▋| 1051/1080 [15:36<00:22, 1.30it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MDW, Year: 2021, Month: November: 98%|█████████▊| 1055/1080 [15:39<00:16, 1.55it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1298875846.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airport: MDW, Year: 2023, Month: December: 100%|██████████| 1080/1080 [16:03<00:00, 1.12it/s] \n" ] } ], "source": [ "# for every IATA code, get the corresponding airport data from the relevant_airport_data folder\n", "# and save it in a new csv file\n", "\n", "total_iterations = len(relevant_airport_codes) * 3 * 12\n", "\n", "progress_bar = tqdm(total=total_iterations, position=0)\n", "\n", "for airport in relevant_airport_codes:\n", " if not os.path.exists(os.path.join(origins_folder, airport + \".csv\")):\n", " airport_data = pd.DataFrame()\n", " for year in os.listdir(relevant_airport_data):\n", " for month_index, month in enumerate(os.listdir(os.path.join(relevant_airport_data, year))):\n", "\n", " progress_bar.update(1)\n", " progress_bar.set_description(f\"Airport: {airport}, Year: {year}, Month: {calendar.month_name[month_index + 1]}\")\n", "\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", " airport_data = pd.concat([airport_data, database[database['Origin'] == airport]])\n", " \n", " airport_data.to_csv(os.path.join(origins_folder, airport + \".csv\"), index=False)\n", " \n", " else:\n", " progress_bar.update(3*12)\n", "\n", "progress_bar.close()" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Airline: Endeavor Air Inc., Year: 2021, Month: March: 0%| | 3/612 [00:01<05:51, 1.73it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Endeavor Air Inc., Year: 2021, Month: April: 1%| | 4/612 [00:02<06:56, 1.46it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Endeavor Air Inc., Year: 2021, Month: July: 1%| | 7/612 [00:04<06:48, 1.48it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Endeavor Air Inc., Year: 2021, Month: November: 2%|▏ | 11/612 [00:07<06:58, 1.43it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: American Airlines Inc., Year: 2021, Month: March: 6%|▋ | 39/612 [00:36<13:39, 1.43s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: American Airlines Inc., Year: 2021, Month: April: 7%|▋ | 40/612 [00:37<11:51, 1.24s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: American Airlines Inc., Year: 2021, Month: July: 7%|▋ | 43/612 [00:39<08:34, 1.11it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: American Airlines Inc., Year: 2021, Month: November: 8%|▊ | 47/612 [00:42<06:39, 1.41it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Alaska Airlines Inc., Year: 2021, Month: March: 12%|█▏ | 75/612 [01:34<41:40, 4.66s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Alaska Airlines Inc., Year: 2021, Month: April: 12%|█▏ | 76/612 [01:35<31:22, 3.51s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Alaska Airlines Inc., Year: 2021, Month: July: 13%|█▎ | 79/612 [01:37<14:58, 1.69s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Alaska Airlines Inc., Year: 2021, Month: November: 14%|█▎ | 83/612 [01:40<08:14, 1.07it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: JetBlue Airways, Year: 2021, Month: March: 18%|█▊ | 111/612 [02:12<13:59, 1.68s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: JetBlue Airways, Year: 2021, Month: April: 18%|█▊ | 112/612 [02:13<12:33, 1.51s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: JetBlue Airways, Year: 2021, Month: July: 19%|█▉ | 115/612 [02:15<08:16, 1.00it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: JetBlue Airways, Year: 2021, Month: November: 19%|█▉ | 119/612 [02:18<06:33, 1.25it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Delta Air Lines Inc., Year: 2021, Month: March: 24%|██▍ | 147/612 [02:52<14:49, 1.91s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Delta Air Lines Inc., Year: 2021, Month: April: 24%|██▍ | 148/612 [02:53<12:20, 1.59s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Delta Air Lines Inc., Year: 2021, Month: July: 25%|██▍ | 151/612 [02:55<08:05, 1.05s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Delta Air Lines Inc., Year: 2021, Month: November: 25%|██▌ | 155/612 [02:58<06:15, 1.22it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Frontier Airlines Inc., Year: 2021, Month: March: 30%|██▉ | 183/612 [03:50<32:28, 4.54s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Frontier Airlines Inc., Year: 2021, Month: April: 30%|███ | 184/612 [03:51<24:16, 3.40s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Frontier Airlines Inc., Year: 2021, Month: July: 31%|███ | 187/612 [03:53<11:09, 1.57s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Frontier Airlines Inc., Year: 2021, Month: November: 31%|███ | 191/612 [03:56<05:59, 1.17it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Allegiant Air, Year: 2021, Month: March: 36%|███▌ | 219/612 [04:24<08:50, 1.35s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Allegiant Air, Year: 2021, Month: April: 36%|███▌ | 220/612 [04:25<07:37, 1.17s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Allegiant Air, Year: 2021, Month: July: 36%|███▋ | 223/612 [04:27<05:13, 1.24it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Allegiant Air, Year: 2021, Month: November: 37%|███▋ | 227/612 [04:29<04:21, 1.47it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Hawaiian Airlines Inc., Year: 2021, Month: March: 42%|████▏ | 255/612 [04:53<05:05, 1.17it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Hawaiian Airlines Inc., Year: 2021, Month: April: 42%|████▏ | 256/612 [04:54<04:46, 1.24it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Hawaiian Airlines Inc., Year: 2021, Month: July: 42%|████▏ | 259/612 [04:55<03:54, 1.50it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Hawaiian Airlines Inc., Year: 2021, Month: November: 43%|████▎ | 263/612 [04:58<03:37, 1.60it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Envoy Air, Year: 2021, Month: March: 48%|████▊ | 291/612 [05:21<04:10, 1.28it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Envoy Air, Year: 2021, Month: April: 48%|████▊ | 292/612 [05:22<04:07, 1.29it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Envoy Air, Year: 2021, Month: July: 48%|████▊ | 295/612 [05:23<03:33, 1.48it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Envoy Air, Year: 2021, Month: November: 49%|████▉ | 299/612 [05:26<03:16, 1.59it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Spirit Air Lines, Year: 2021, Month: March: 53%|█████▎ | 327/612 [05:53<06:25, 1.35s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Spirit Air Lines, Year: 2021, Month: April: 54%|█████▎ | 328/612 [05:54<05:25, 1.15s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Spirit Air Lines, Year: 2021, Month: July: 54%|█████▍ | 331/612 [05:55<03:37, 1.29it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Spirit Air Lines, Year: 2021, Month: November: 55%|█████▍ | 335/612 [05:58<02:54, 1.59it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: PSA Airlines Inc., Year: 2021, Month: March: 59%|█████▉ | 363/612 [06:26<06:49, 1.64s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: PSA Airlines Inc., Year: 2021, Month: April: 59%|█████▉ | 364/612 [06:27<05:38, 1.36s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: PSA Airlines Inc., Year: 2021, Month: July: 60%|█████▉ | 367/612 [06:29<03:33, 1.15it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: PSA Airlines Inc., Year: 2021, Month: November: 61%|██████ | 371/612 [06:31<02:42, 1.49it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: SkyWest Airlines Inc., Year: 2021, Month: March: 65%|██████▌ | 399/612 [06:57<04:38, 1.31s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: SkyWest Airlines Inc., Year: 2021, Month: April: 65%|██████▌ | 400/612 [06:58<04:04, 1.15s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: SkyWest Airlines Inc., Year: 2021, Month: July: 66%|██████▌ | 403/612 [07:00<02:46, 1.26it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: SkyWest Airlines Inc., Year: 2021, Month: November: 67%|██████▋ | 407/612 [07:03<02:15, 1.51it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Horizon Air, Year: 2021, Month: March: 71%|███████ | 435/612 [07:40<08:16, 2.80s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Horizon Air, Year: 2021, Month: April: 71%|███████ | 436/612 [07:41<06:27, 2.20s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Horizon Air, Year: 2021, Month: July: 72%|███████▏ | 439/612 [07:43<03:17, 1.14s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Horizon Air, Year: 2021, Month: November: 72%|███████▏ | 443/612 [07:45<02:07, 1.33it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: United Air Lines Inc., Year: 2021, Month: March: 77%|███████▋ | 471/612 [08:08<02:08, 1.10it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: United Air Lines Inc., Year: 2021, Month: April: 77%|███████▋ | 472/612 [08:09<01:58, 1.18it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: United Air Lines Inc., Year: 2021, Month: July: 78%|███████▊ | 475/612 [08:11<01:35, 1.43it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: United Air Lines Inc., Year: 2021, Month: November: 78%|███████▊ | 479/612 [08:14<01:27, 1.53it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Southwest Airlines Co., Year: 2021, Month: March: 83%|████████▎ | 507/612 [08:54<05:37, 3.22s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Southwest Airlines Co., Year: 2021, Month: April: 83%|████████▎ | 508/612 [08:55<04:16, 2.46s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Southwest Airlines Co., Year: 2021, Month: July: 83%|████████▎ | 511/612 [08:57<02:07, 1.26s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Southwest Airlines Co., Year: 2021, Month: November: 84%|████████▍ | 515/612 [08:59<01:14, 1.30it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Mesa Airlines Inc., Year: 2021, Month: March: 89%|████████▊ | 543/612 [09:47<04:47, 4.17s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Mesa Airlines Inc., Year: 2021, Month: April: 89%|████████▉ | 544/612 [09:47<03:33, 3.14s/it]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Mesa Airlines Inc., Year: 2021, Month: July: 89%|████████▉ | 547/612 [09:49<01:35, 1.47s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Mesa Airlines Inc., Year: 2021, Month: November: 90%|█████████ | 551/612 [09:52<00:48, 1.25it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Republic Airline, Year: 2021, Month: March: 95%|█████████▍| 579/612 [10:16<00:33, 1.01s/it] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Republic Airline, Year: 2021, Month: April: 95%|█████████▍| 580/612 [10:17<00:29, 1.07it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Republic Airline, Year: 2021, Month: July: 95%|█████████▌| 583/612 [10:19<00:21, 1.37it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Republic Airline, Year: 2021, Month: November: 96%|█████████▌| 587/612 [10:21<00:15, 1.59it/s] C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1200068059.py:17: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", "Airline: Republic Airline, Year: 2023, Month: December: 100%|██████████| 612/612 [10:49<00:00, 1.06s/it] \n" ] } ], "source": [ "# for every airline, get the corresponding airport data from the cleaned flight data\n", "# and save it in a new csv file\n", "\n", "total_iterations = len(relevant_airlines) * 3 * 12\n", "\n", "progress_bar = tqdm(total=total_iterations, position=0)\n", "\n", "for airline in relevant_airlines:\n", " if not os.path.exists(os.path.join(airlines_folder, airline + \".csv\")):\n", " airport_data = pd.DataFrame()\n", " for year in os.listdir(relevant_airport_data):\n", " for month_index, month in enumerate(os.listdir(os.path.join(relevant_airport_data, year))):\n", "\n", " progress_bar.update(1)\n", " progress_bar.set_description(f\"Airline: {airline}, Year: {year}, Month: {calendar.month_name[month_index + 1]}\")\n", "\n", " database = pd.read_csv(os.path.join(relevant_airport_data, year, month))\n", " airport_data = pd.concat([airport_data, database[database['Carrier'] == airline]])\n", " \n", " airport_data.to_csv(os.path.join(airlines_folder, airline + \".csv\"), index=False)\n", " \n", " else:\n", " progress_bar.update(3*12)\n", "\n", "progress_bar.close()" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Origin: LAS: 53%|█████▎ | 16/30 [00:14<00:10, 1.31it/s]C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_19984\\1435418421.py:15: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(origins_folder, origin))\n", "Origin: TPA: 100%|██████████| 30/30 [00:31<00:00, 1.04s/it]\n" ] } ], "source": [ "# for every file in airline, get every row with the CancellationReason == \n", "# and save it in a new csv file\n", "\n", "origins_folder = \"origins\"\n", "\n", "total_iterations = len(os.listdir(origins_folder))\n", "progress_bar = tqdm(total=total_iterations, position=0)\n", "\n", "weather_data = pd.DataFrame()\n", "\n", "for origin in os.listdir(origins_folder):\n", " progress_bar.update(1)\n", " progress_bar.set_description(f\"Origin: {os.path.splitext(origin)[0]}\")\n", "\n", " database = pd.read_csv(os.path.join(origins_folder, origin))\n", " weather_data = pd.concat([weather_data, database[database['Cancelled'] == True]])\n", " \n", "weather_data['WeatherOrNasDelay'] = (weather_data['WeatherDelay'] > 0) | (weather_data['NASDelay'] > 0)\n", "\n", "weather_data.to_csv(\"weather_nas_delay.csv\", index=False)\n", "\n", "progress_bar.close()" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time 0\n", "Origin 0\n", "Dest 0\n", "Carrier 0\n", "Cancelled 0\n", "CancellationReason 0\n", "Delayed 0\n", "DepDelayMinutes 321246\n", "CarrierDelay 331892\n", "WeatherDelay 331892\n", "NASDelay 331892\n", "SecurityDelay 331892\n", "LateAircraftDelay 331892\n", "Temperature 0\n", "Feels_Like_Temperature 0\n", "Altimeter_Pressure 0\n", "Sea_Level_Pressure 0\n", "Visibility 0\n", "Wind_Speed 0\n", "Wind_Gust 0\n", "Precipitation 0\n", "Ice_Accretion_3hr 0\n", "WeatherOrNasDelay 0\n", "dtype: int64\n" ] } ], "source": [ "database = pd.read_csv('weather_nas_delay.csv')\n", "print(database.isna().sum())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I forgot to convert the altimeter values from inches to hPa :(" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def inches_to_hPa(inches):\n", " hPa = round(inches * 33.86389, 2)\n", " return hPa\n", "\n", "data_folder = '../data'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_9064\\922662527.py:2: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(data_folder, 'airlines', airline))\n" ] } ], "source": [ "for airline in os.listdir(os.path.join(data_folder, 'airlines')):\n", " database = pd.read_csv(os.path.join(data_folder, 'airlines', airline))\n", " \n", " database['Altimeter_Pressure'] = database['Altimeter_Pressure'].apply(inches_to_hPa)\n", " \n", " database.to_csv(os.path.join(data_folder, 'airlines', airline), index=False)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ATL.csv\n", "AUS.csv\n", "BNA.csv\n", "BOS.csv\n", "BWI.csv\n", "CLT.csv\n", "DCA.csv\n", "DEN.csv\n", "DFW.csv\n", "DTW.csv\n", "EWR.csv\n", "FLL.csv\n", "IAD.csv\n", "IAH.csv\n", "JFK.csv\n", "LAS.csv\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\wipar\\AppData\\Local\\Temp\\ipykernel_9064\\1778125799.py:3: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.\n", " database = pd.read_csv(os.path.join(data_folder, 'origins', origin))\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "LAX.csv\n", "LGA.csv\n", "MCO.csv\n", "MDW.csv\n", "MIA.csv\n", "MSP.csv\n", "ORD.csv\n", "PHL.csv\n", "PHX.csv\n", "SAN.csv\n", "SEA.csv\n", "SFO.csv\n", "SLC.csv\n", "TPA.csv\n" ] } ], "source": [ "for origin in os.listdir(os.path.join(data_folder, 'origins')):\n", " print(origin)\n", " database = pd.read_csv(os.path.join(data_folder, 'origins', origin))\n", " \n", " database['Altimeter_Pressure'] = database['Altimeter_Pressure'].apply(inches_to_hPa)\n", " \n", " database.to_csv(os.path.join(data_folder, 'origins', origin), index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 2 }