{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "collapsed_sections": [], "machine_shape": "hm" }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "id": "qIFLx0_wimTB" }, "outputs": [], "source": [ "import pandas as pd\n", "pd.set_option('max_colwidth',150)\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from datetime import datetime as dt\n", "from string import punctuation\n", "import re\n", "import os\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\" # allow multiple outputs in a cell\n", "import warnings\n", "import pandas as pd\n", "pd.options.plotting.backend = \"plotly\"\n", "warnings.filterwarnings(\"ignore\")\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "source": [ "# Download and Extract the Datasets" ], "metadata": { "id": "QqvaLRjVjIj3" } }, { "cell_type": "code", "source": [ "# Downloading all-the-news-2-news-articles-dataset \n", "! wget https://www.dropbox.com/s/cn2utnr5ipathhh/all-the-news-2-1.zip?dl=0\n", "\n", "# Downloading Annotated Corpus for Named Entity Recognition dataset\n", "!gdown https://drive.google.com/uc?id=13y8JNgL5TQ4x-yufpBOv3QBsEiE051sE\n", "\n", "# Make a data folder to store the data\n", "!mkdir data\n", "\n", "!unzip /content/all-the-news-2-1.zip?dl=0 -d ./data/\n", "\n", "!mv /content/ner.csv ./data\n", "\n", "!rm /content/all-the-news-2-1.zip?dl=0\n", "\n" ], "metadata": { "id": "VYvJeKsujCFY" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Load Data" ], "metadata": { "id": "liJiX3Xf2hQh" } }, { "cell_type": "code", "source": [ "#specify the path to data location\n", "\n", "filepath = '/content/data/all-the-news-2-1.csv'\n", "# data = pd.read_csv(filepath, encoding = \"ISO-8859-1\")\n", "data = pd.read_csv(filepath, encoding = \"utf-8\") \n" ], "metadata": { "id": "LMwtt2rJnNhB" }, "execution_count": 3, "outputs": [] }, { "cell_type": "code", "source": [ "#Verify that the data is loaded correctly\n", "data.head(3)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "g4VoxOSnnOs9", "outputId": "4f0dea96-29e8-4f80-f009-12e9ef6e0c05" }, "execution_count": 4, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " date year month day author \\\n", "0 2016-12-09 18:31:00 2016 12.0 9 Lee Drutman \n", "1 2016-10-07 21:26:46 2016 10.0 7 Scott Davis \n", "2 2018-01-26 00:00:00 2018 1.0 26 NaN \n", "\n", " title \\\n", "0 We should take concerns about the health of liberal democracy seriously \n", "1 Colts GM Ryan Grigson says Andrew Luck's contract makes it difficult to build the team \n", "2 Trump denies report he ordered Mueller fired \n", "\n", " article \\\n", "0 This post is part of Polyarchy, an independent blog produced by the political reform program at New America, a Washington think tank devoted to de... \n", "1 The Indianapolis Colts made Andrew Luck the highest-paid player in NFL history this offseason with a five-year, $122-million contract with $89 mi... \n", "2 DAVOS, Switzerland (Reuters) - U.S. President Donald Trump denied a report on Friday that he had ordered Special Counsel Robert Mueller fired last... \n", "\n", " url \\\n", "0 https://www.vox.com/polyarchy/2016/12/9/13898340/democracy-warning-signs \n", "1 https://www.businessinsider.com/colts-gm-ryan-grigson-andrew-luck-contract-2016-10 \n", "2 https://www.reuters.com/article/us-davos-meeting-trump-mueller/trump-denies-report-he-ordered-mueller-fired-idUSKBN1FF12A \n", "\n", " section publication \n", "0 NaN Vox \n", "1 NaN Business Insider \n", "2 Davos Reuters " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateyearmonthdayauthortitlearticleurlsectionpublication
02016-12-09 18:31:00201612.09Lee DrutmanWe should take concerns about the health of liberal democracy seriouslyThis post is part of Polyarchy, an independent blog produced by the political reform program at New America, a Washington think tank devoted to de...https://www.vox.com/polyarchy/2016/12/9/13898340/democracy-warning-signsNaNVox
12016-10-07 21:26:46201610.07Scott DavisColts GM Ryan Grigson says Andrew Luck's contract makes it difficult to build the teamThe Indianapolis Colts made Andrew Luck the highest-paid player in NFL history this offseason with a five-year, $122-million contract with $89 mi...https://www.businessinsider.com/colts-gm-ryan-grigson-andrew-luck-contract-2016-10NaNBusiness Insider
22018-01-26 00:00:0020181.026NaNTrump denies report he ordered Mueller firedDAVOS, Switzerland (Reuters) - U.S. President Donald Trump denied a report on Friday that he had ordered Special Counsel Robert Mueller fired last...https://www.reuters.com/article/us-davos-meeting-trump-mueller/trump-denies-report-he-ordered-mueller-fired-idUSKBN1FF12ADavosReuters
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 4 } ] }, { "cell_type": "code", "source": [ "#totally the data have 2,688,878 rows and 10 columns\n", "data.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iJZa9dP1vGeN", "outputId": "14160081-8bff-406c-e007-ef19fe9b693a" }, "execution_count": 5, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(2688878, 10)" ] }, "metadata": {}, "execution_count": 5 } ] }, { "cell_type": "code", "source": [ "data.info()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XwYxq7Wqx8QH", "outputId": "46a8a227-6468-434b-cabc-0c804a6cbf04" }, "execution_count": 6, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 2688878 entries, 0 to 2688877\n", "Data columns (total 10 columns):\n", " # Column Dtype \n", "--- ------ ----- \n", " 0 date object \n", " 1 year int64 \n", " 2 month float64\n", " 3 day int64 \n", " 4 author object \n", " 5 title object \n", " 6 article object \n", " 7 url object \n", " 8 section object \n", " 9 publication object \n", "dtypes: float64(1), int64(2), object(7)\n", "memory usage: 205.1+ MB\n" ] } ] }, { "cell_type": "code", "source": [ "data.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "o6obun2r48jC", "outputId": "fe7b1474-5629-42ea-93e0-e0db765a82f3" }, "execution_count": 7, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "date 0\n", "year 0\n", "month 0\n", "day 0\n", "author 1021101\n", "title 37\n", "article 104713\n", "url 12577\n", "section 912273\n", "publication 12577\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 7 } ] }, { "cell_type": "markdown", "source": [ "# Observation about the all-the-news-data\n", "\n", "- The data has 10 columns and 2688878 rows\n", "- 6 columns of the data have null values. The columns name are: \n", " - author(it has 1021101 null values)\n", " - title(it has 37 null values)\n", " - article(it has 104713 null values)\n", " - url(it has 12577 null values)\n", " - section(it has 912223 null values)\n", "- data type of the columns int(2), float(1), and object(7)\n", "- The 'date' column data type is Object. It should be converted into date data type\n", "-author, title, article, url, section, and publication columns have object data types. it should be converted into string\n" ], "metadata": { "id": "EzYWiTEN5tnh" } }, { "cell_type": "code", "source": [ "# the total number of count for each publication\n", "data['publication'].value_counts(dropna=False)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d9yp68G95lYQ", "outputId": "6d22d88b-d907-4b07-f45f-bce11df8c577" }, "execution_count": 8, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Reuters 840094\n", "The New York Times 252259\n", "CNBC 238096\n", "The Hill 208411\n", "People 136488\n", "CNN 127602\n", "Refinery 29 111433\n", "Vice 101137\n", "Mashable 94107\n", "Business Insider 57953\n", "The Verge 52424\n", "TechCrunch 52095\n", "TMZ 49595\n", "Axios 47815\n", "Vox 47272\n", "Politico 46377\n", "Washington Post 40882\n", "Buzzfeed News 32819\n", "Gizmodo 27228\n", "Economist 26227\n", "Wired 20243\n", "Fox News 20144\n", "Vice News 15539\n", "Hyperallergic 13551\n", "NaN 12577\n", "New Republic 11809\n", "New Yorker 4701\n", "Name: publication, dtype: int64" ] }, "metadata": {}, "execution_count": 8 } ] }, { "cell_type": "code", "source": [ "data['section'].value_counts(dropna=False)[:50]" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FsJsZvUdAqwf", "outputId": "4b3b68fe-8fed-47de-9c4d-6a2548b8fcf4" }, "execution_count": 9, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "NaN 912273\n", "Market News 108724\n", "World News 108651\n", "Business News 96395\n", "Wires 67352\n", "Financials 57845\n", "politics 53496\n", "us 51242\n", "Intel 39805\n", "Bonds News 39672\n", "Politics 33875\n", "Healthcare 30883\n", "world 28530\n", "opinion 27465\n", "Consumer Goods and Retail 26766\n", "Sports News 26324\n", "business 25335\n", "tv 24783\n", "sports 23909\n", "Tech 21605\n", "arts 21230\n", "movies 19683\n", "Commodities 17620\n", "Deals 15847\n", "style 15355\n", "Tech by VICE 15222\n", "Entertainment 13773\n", "health 13629\n", "nyregion 13498\n", "Technology News 12763\n", "Music by VICE 12420\n", "Environment 11639\n", "Company News 11572\n", "Health News 11235\n", "crime 11208\n", "Sports 11149\n", "music 10402\n", "celebrity 10242\n", "Food by VICE 9933\n", "opinions 9815\n", "entertainment 9596\n", "Energy 9435\n", "fashion 9063\n", "U.S. 9017\n", "books 8704\n", "Big Story 10 7995\n", "magazine 7922\n", "Funds News 7708\n", "Noisey 7702\n", "Cyclical Consumer Goods 7625\n", "Name: section, dtype: int64" ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "code", "source": [ "def filter_section(section):\n", "\n", " if str(section).lower().startswith('tech') :\n", " return 'technology'\n", " elif str(section).lower().startswith('health'):\n", " return 'health'\n", "\n", " return 'other'" ], "metadata": { "id": "Dmr1wuZ0BRSl" }, "execution_count": 10, "outputs": [] }, { "cell_type": "code", "source": [ "data['tech_health_tag'] = data['section'].apply(filter_section)" ], "metadata": { "id": "sIH7XCc2CgdN" }, "execution_count": 11, "outputs": [] }, { "cell_type": "code", "source": [ "data['tech_health_tag'].value_counts()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "96DJ1PspKaIp", "outputId": "d73ed7f7-0e3e-4f2c-96c1-f16c35ac040a" }, "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "other 2562768\n", "health 65261\n", "technology 60849\n", "Name: tech_health_tag, dtype: int64" ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "markdown", "source": [ "# Load the data which focus only on Health and Technology Section" ], "metadata": { "id": "b_ko3gtRM8aY" } }, { "cell_type": "code", "source": [ "data_tech_health = data[(data['tech_health_tag']=='technology') | (data['tech_health_tag']=='health')]" ], "metadata": { "id": "58MJL8bRKevk" }, "execution_count": 13, "outputs": [] }, { "cell_type": "code", "source": [ "data_tech_health = data_tech_health.reset_index(drop=True)" ], "metadata": { "id": "To-J8aVjVANm" }, "execution_count": 14, "outputs": [] }, { "cell_type": "code", "source": [ "data_tech_health.head(3)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "XNYfPQA6N0Sr", "outputId": "b5be3f0b-4eee-471f-8e58-7e837a36ee0b" }, "execution_count": 15, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " date year month day author \\\n", "0 2018-05-02 17:09:00 2018 5.0 2 Caroline Williams \n", "1 2018-10-05 19:35:00 2018 10.0 5 Caroline Haskins \n", "2 2019-06-20 00:00:00 2019 6.0 20 Gergely Szakacs \n", "\n", " title \\\n", "0 You Can Trick Your Brain Into Being More Focused \n", "1 Trash Geyser Spews Garbage In Yellowstone National Park \n", "2 Hungary has no evidence of Huawei threat, plans rapid 5G rollout: minister \n", "\n", " article \\\n", "0 If only every day could be like this. You can’t put your finger on why: Maybe you had just the right amount of sleep. Maybe the stars are somehow ... \n", "1 Geyser eruptions are known as one of the most beautiful events to occur in nature. Not anymore! On September 15, Yellowstone Park’s Ear Spring ge... \n", "2 BUDAPEST (Reuters) - Hungary has no evidence that equipment from Chinese telecoms giant Huawei poses a security threat, a government minister said... \n", "\n", " url \\\n", "0 https://www.vice.com/en_us/article/9kgp4v/how-to-improve-focus-be-more-creative \n", "1 https://www.vice.com/en_us/article/evwq47/ear-spring-geyser-spews-trash-in-yellowstone-national-park \n", "2 https://www.reuters.com/article/us-hungary-telecoms-5g-huawei/hungary-has-no-evidence-of-huawei-threat-plans-rapid-5g-rollout-minister-idUSKCN1TL2AP \n", "\n", " section publication tech_health_tag \n", "0 Health Vice health \n", "1 Tech by VICE Vice technology \n", "2 Technology News Reuters technology " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateyearmonthdayauthortitlearticleurlsectionpublicationtech_health_tag
02018-05-02 17:09:0020185.02Caroline WilliamsYou Can Trick Your Brain Into Being More FocusedIf only every day could be like this. You can’t put your finger on why: Maybe you had just the right amount of sleep. Maybe the stars are somehow ...https://www.vice.com/en_us/article/9kgp4v/how-to-improve-focus-be-more-creativeHealthVicehealth
12018-10-05 19:35:00201810.05Caroline HaskinsTrash Geyser Spews Garbage In Yellowstone National ParkGeyser eruptions are known as one of the most beautiful events to occur in nature. Not anymore! On September 15, Yellowstone Park’s Ear Spring ge...https://www.vice.com/en_us/article/evwq47/ear-spring-geyser-spews-trash-in-yellowstone-national-parkTech by VICEVicetechnology
22019-06-20 00:00:0020196.020Gergely SzakacsHungary has no evidence of Huawei threat, plans rapid 5G rollout: ministerBUDAPEST (Reuters) - Hungary has no evidence that equipment from Chinese telecoms giant Huawei poses a security threat, a government minister said...https://www.reuters.com/article/us-hungary-telecoms-5g-huawei/hungary-has-no-evidence-of-huawei-threat-plans-rapid-5g-rollout-minister-idUSKCN1TL2APTechnology NewsReuterstechnology
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "code", "source": [ "data_tech_health.shape" ], "metadata": { "id": "wtiS6LnUWsoV", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "ed23da4b-2543-47f5-8996-254e0bce33cf" }, "execution_count": 16, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(126110, 11)" ] }, "metadata": {}, "execution_count": 16 } ] }, { "cell_type": "code", "source": [ "data_tech_health.info()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "NNcVLQPRN5ha", "outputId": "5ca12925-2a67-4d62-ad91-c703bfe488f4" }, "execution_count": 17, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 126110 entries, 0 to 126109\n", "Data columns (total 11 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 date 126110 non-null object \n", " 1 year 126110 non-null int64 \n", " 2 month 126110 non-null float64\n", " 3 day 126110 non-null int64 \n", " 4 author 63297 non-null object \n", " 5 title 126109 non-null object \n", " 6 article 125948 non-null object \n", " 7 url 126110 non-null object \n", " 8 section 126110 non-null object \n", " 9 publication 126110 non-null object \n", " 10 tech_health_tag 126110 non-null object \n", "dtypes: float64(1), int64(2), object(8)\n", "memory usage: 10.6+ MB\n" ] } ] }, { "cell_type": "code", "source": [ "data_tech_health.isnull().sum()" ], "metadata": { "id": "SQl9EV7GN_8A", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "1b04db26-3dbf-4153-a971-d41d96285ee5" }, "execution_count": 18, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "date 0\n", "year 0\n", "month 0\n", "day 0\n", "author 62813\n", "title 1\n", "article 162\n", "url 0\n", "section 0\n", "publication 0\n", "tech_health_tag 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "markdown", "source": [ "**Observation the data focus on health and technology section**\n", "\n", "- The data has 10 columns and 126110 rows\n", "- 3 columns of the data have null values. The columns name are:\n", " - author(it has 62813 null values)\n", " - title(it has 1 null values)\n", " - article(it has 162 null values)\n", " \n", "- data type of the columns int(2), float(1), and object(7)\n", "- The 'date' column data type is Object. It should be converted into date data type\n", "- author, title, article, url, section, and publication columns have object data types. it should be converted into string\n" ], "metadata": { "id": "cdjbymGQqHUs" } }, { "cell_type": "code", "source": [ "data_tech_health['publication'].unique()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "B6n1RcZKs7c6", "outputId": "cd6553fc-789c-47e1-ca46-18bdab8d4c78" }, "execution_count": 19, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array(['Vice', 'Reuters', 'The Verge', 'People', 'Economist', 'CNN',\n", " 'Gizmodo', 'CNBC', 'Fox News', 'The New York Times'], dtype=object)" ] }, "metadata": {}, "execution_count": 19 } ] }, { "cell_type": "code", "source": [ "plt.figure(figsize=(10,5))\n", "publication_plot = sns.countplot(\n", " data=data_tech_health,\n", " x='publication',\n", " palette='Set1',\n", " order = data_tech_health['publication'].value_counts().index\n", ")\n", "\n", "plt.xticks(\n", " rotation=45, \n", " horizontalalignment='right',\n", " fontweight='light',\n", " fontsize='x-large' \n", ")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 471 }, "id": "NseIit5Cyiuz", "outputId": "4b567648-56b8-4a75-df67-61893a81bdf2" }, "execution_count": 20, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "
" ] }, "metadata": {}, "execution_count": 20 }, { "output_type": "execute_result", "data": { "text/plain": [ "(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),\n", " )" ] }, "metadata": {}, "execution_count": 20 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnAAAAGjCAYAAACsUSi/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd7hcVfX/8fdKo2lIkBBKgCAEEJAiMYDwA+lBUXoEBCIgQUFFURFQ6U0sCAgI0hGl2ECkiBQRpIUmBkRCk/ClSUJRmiHr98dawz0Z7k3uJZl7Zl8+r+eZ587sc2ayd87MOevsau6OiIiIiJSjX90ZEBEREZGeUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUpgBdWegty288MI+cuTIurMhIiIiMlt33XXXv919WHN6SwM4MxsCnAmsAjiwB/AQcDEwEngcGOfu08zMgBOBTwCvAp9z97vzc8YD38mPPcrdz8v0NYFzgfmAK4H9fDYT240cOZKJEyfOvUKKiIiItIiZPdFZequbUE8Ernb3FYHVgAeBA4Hr3H0UcF2+BtgCGJWPCcBpAGa2EHAosBYwBjjUzIbme04D9qq8b2yLyyMiIiJSu5YFcGa2ILA+cBaAu7/p7i8CWwHn5W7nAVvn862A8z3cBgwxs8WAzYFr3X2qu08DrgXG5rbB7n5b1rqdX/ksERERkT6rlTVwywDPA+eY2T1mdqaZLQAMd/enc59ngOH5fAngycr7p2TarNKndJIuIiIi0qe1MoAbAHwEOM3d1wD+S0dzKQBZc9byxVjNbIKZTTSzic8//3yr/zkRERGRlmplADcFmOLut+frXxEB3bPZ/En+fS63PwUsWXn/iEybVfqITtLfwd3PcPfR7j562LB3DOQQERERKUrLAjh3fwZ40sxWyKSNgQeAy4HxmTYeuCyfXw7sZmFt4KVsar0G2MzMhubghc2Aa3Lby2a2do5g3a3yWSIiIiJ9VqvngfsycKGZDQIeBXYngsZLzGxP4AlgXO57JTGFyGRiGpHdAdx9qpkdCdyZ+x3h7lPz+T50TCNyVT5ERERE+jSbzbRpfc7o0aNd88CJiIhICczsLncf3ZyupbRERERECqMATkRERKQwCuBERERECvOeW8y+MxNHj6k7Cz02euIddWdBREREaqIaOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCtDSAM7PHzex+M7vXzCZm2kJmdq2ZPZx/h2a6mdlJZjbZzP5mZh+pfM743P9hMxtfSV8zP39yvtdaWR4RERGRdtAbNXAbuvvq7j46Xx8IXOfuo4Dr8jXAFsCofEwAToMI+IBDgbWAMcChjaAv99mr8r6xrS+OiIiISL3qaELdCjgvn58HbF1JP9/DbcAQM1sM2By41t2nuvs04FpgbG4b7O63ubsD51c+S0RERKTPanUA58AfzewuM5uQacPd/el8/gwwPJ8vATxZee+UTJtV+pRO0kVERET6tAEt/vz13P0pM1sEuNbM/lHd6O5uZt7iPJDB4wSApZZaqtX/nIiIiEhLtbQGzt2fyr/PAb8l+rA9m82f5N/ncvengCUrbx+RabNKH9FJemf5OMPdR7v76GHDhs1psURERERq1bIAzswWMLP3N54DmwF/By4HGiNJxwOX5fPLgd1yNOrawEvZ1HoNsJmZDc3BC5sB1+S2l81s7Rx9ulvls0RERET6rFY2oQ4HfpszewwAfuHuV5vZncAlZrYn8AQwLve/EvgEMBl4FdgdwN2nmtmRwJ253xHuPjWf7wOcC8wHXJUPERERkT6tZQGcuz8KrNZJ+gvAxp2kO7BvF591NnB2J+kTgVXmOLMiIiIiBdFKDCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUpiWB3Bm1t/M7jGzK/L1MmZ2u5lNNrOLzWxQps+Tryfn9pGVzzgo0x8ys80r6WMzbbKZHdjqsoiIiIi0g96ogdsPeLDy+nvACe6+HDAN2DPT9wSmZfoJuR9mthKwI7AyMBY4NYPC/sApwBbASsBOua+IiIhIn9bSAM7MRgCfBM7M1wZsBPwqdzkP2Dqfb5Wvye0b5/5bARe5+xvu/hgwGRiTj8nu/qi7vwlclPuKiIiI9GmtroH7MXAAMCNffwB40d2n5+spwBL5fAngSYDc/lLu/3Z603u6ShcRERHp01oWwJnZlsBz7n5Xq/6NHuRlgplNNLOJzz//fN3ZEREREZkjrayBWxf4tJk9TjRvbgScCAwxswG5zwjgqXz+FLAkQG5fEHihmt70nq7S38Hdz3D30e4+etiwYXNeMhEREZEatSyAc/eD3H2Eu48kBiFc7+6fBW4Ats/dxgOX5fPL8zW5/Xp390zfMUepLgOMAu4A7gRG5ajWQflvXN6q8oiIiIi0iwGz32Wu+xZwkZkdBdwDnJXpZwEXmNlkYCoRkOHuk8zsEuABYDqwr7u/BWBmXwKuAfoDZ7v7pF4tiYiIiEgNeiWAc/cbgRvz+aPECNLmfV4Hduji/UcDR3eSfiVw5VzMqoiIiEjb00oMIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoXpVgBnZtd1J01EREREWm/ArDaa2bzA/MDCZjYUsNw0GFiixXkTERERkU7MMoAD9ga+CiwO3EVHAPcy8JMW5ktEREREujDLAM7dTwRONLMvu/vJvZQnEREREZmF2dXAAeDuJ5vZx4CR1fe4+/ktypeIiIiIdKFbAZyZXQAsC9wLvJXJDiiAExEREell3QrggNHASu7urcyMiIiIiMxed+eB+zuwaCszIiIiIiLd090auIWBB8zsDuCNRqK7f7oluRIRERGRLnU3gDuslZkQERERke7rVhOqu/+5s8es3mNm85rZHWZ2n5lNMrPDM30ZM7vdzCab2cVmNijT58nXk3P7yMpnHZTpD5nZ5pX0sZk22cwOfDf/ASIiIiKl6e5SWq+Y2cv5eN3M3jKzl2fztjeAjdx9NWB1YKyZrQ18DzjB3ZcDpgF75v57AtMy/YTcDzNbCdgRWBkYC5xqZv3NrD9wCrAFsBKwU+4rIiIi0qd1twbu/e4+2N0HA/MB2wGnzuY97u7/yZcD8+HARsCvMv08YOt8vlW+JrdvbGaW6Re5+xvu/hgwGRiTj8nu/qi7vwlclPuKiIiI9GndHYX6tgzMfgdsPrt9s6bsXuA54FrgEeBFd5+eu0yhY03VJYAn89+YDrwEfKCa3vSertJFRERE+rTuTuS7beVlP2JeuNdn9z53fwtY3cyGAL8FVnw3mZxTZjYBmACw1FJL1ZEFERERkbmmu6NQP1V5Ph14nB40V7r7i2Z2A7AOMMTMBmQt2wjgqdztKWBJYIqZDQAWBF6opDdU39NVevO/fwZwBsDo0aM1GbGIiIgUrbtroe7e0w82s2HA/zJ4mw/YlBiYcAOwPdFnbTxwWb7l8nx9a26/3t3dzC4HfmFmPwIWB0YBdwAGjDKzZYjAbUdg557mU0RERKQ03W1CHQGcDKybSX8B9nP3KbN422LAeTlatB9wibtfYWYPABeZ2VHAPcBZuf9ZwAVmNhmYSgRkuPskM7sEeICo/ds3m2Yxsy8B1wD9gbPdfVI3yy0iIiJSrO42oZ4D/ALYIV/vkmmbdvUGd/8bsEYn6Y8SI0ib01+vfH7ztqOBoztJvxK4cvbZFxEREek7ujsKdZi7n+Pu0/NxLjCshfkSERERkS50N4B7wcx2aUyga2a7EAMMRERERKSXdTeA2wMYBzwDPE0MMvhci/IkIiIiIrPQ3T5wRwDj3X0agJktBPyACOxEREREpBd1twZu1UbwBuDuU+lkgIKIiIiItF53A7h+Zja08SJr4LpbeyciIiIic1F3g7AfArea2aX5egc6mdZDRERERFqvuysxnG9mE4GNMmlbd3+gddkSERERka50uxk0AzYFbSIiIiI1624fOBERERFpEwrgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMArgRERERAqjAE5ERESkMC0L4MxsSTO7wcweMLNJZrZfpi9kZtea2cP5d2imm5mdZGaTzexvZvaRymeNz/0fNrPxlfQ1zez+fM9JZmatKo+IiIhIu2hlDdx04OvuvhKwNrCvma0EHAhc5+6jgOvyNcAWwKh8TABOgwj4gEOBtYAxwKGNoC/32avyvrEtLI+IiIhIW2hZAOfuT7v73fn8FeBBYAlgK+C83O08YOt8vhVwvofbgCFmthiwOXCtu09192nAtcDY3DbY3W9zdwfOr3yWiIiISJ/VK33gzGwksAZwOzDc3Z/OTc8Aw/P5EsCTlbdNybRZpU/pJF1ERESkTxvQ6n/AzN4H/Br4qru/XO2m5u5uZt4LeZhANMuy1FJLtfqfaztjv3tx3VnosauP/EzdWRAREWlbLa2BM7OBRPB2obv/JpOfzeZP8u9zmf4UsGTl7SMybVbpIzpJfwd3P8PdR7v76GHDhs1ZoURERERq1spRqAacBTzo7j+qbLocaIwkHQ9cVknfLUejrg28lE2t1wCbmdnQHLywGXBNbnvZzNbOf2u3ymeJiIiI9FmtbEJdF9gVuN/M7s20g4HjgEvMbE/gCWBcbrsS+AQwGXgV2B3A3aea2ZHAnbnfEe4+NZ/vA5wLzAdclQ8RERGRPq1lAZy73wx0NS/bxp3s78C+XXzW2cDZnaRPBFaZg2yKiIiIFKflgxhEWm3nC3asOws98otdL6o7CyIiUjgtpSUiIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoVRACciIiJSGAVwIiIiIoUZUHcGRGTWTtz59Lqz0CP7/WLvurMgItLnqQZOREREpDAtC+DM7Gwze87M/l5JW8jMrjWzh/Pv0Ew3MzvJzCab2d/M7COV94zP/R82s/GV9DXN7P58z0lmZq0qi4iIiEg7aWUN3LnA2Ka0A4Hr3H0UcF2+BtgCGJWPCcBpEAEfcCiwFjAGOLQR9OU+e1Xe1/xviYiIiPRJLQvg3P0mYGpT8lbAefn8PGDrSvr5Hm4DhpjZYsDmwLXuPtXdpwHXAmNz22B3v83dHTi/8lkiIiIifVpv94Eb7u5P5/NngOH5fAngycp+UzJtVulTOkkXERER6fNqG8SQNWfeG/+WmU0ws4lmNvH555/vjX9SREREpGV6O4B7Nps/yb/PZfpTwJKV/UZk2qzSR3SS3il3P8PdR7v76GHDhs1xIURERETq1NsB3OVAYyTpeOCySvpuORp1beClbGq9BtjMzIbm4IXNgGty28tmtnaOPt2t8lkiIiIifVrLJvI1s18CHwcWNrMpxGjS44BLzGxP4AlgXO5+JfAJYDLwKrA7gLtPNbMjgTtzvyPcvTEwYh9ipOt8wFX5EBEREenzWhbAuftOXWzauJN9Hdi3i885Gzi7k/SJwCpzkkcRERGREmklBhEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKYwCOBEREZHCKIATERERKUzLltISEZmdxw9bpu4s9NjIwx7r9r5PP/2pFuakNRZb7Pd1Z0FEukE1cCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUhgFcCIiIiKFUQAnIiIiUpgBdWdARETKdNpn16k7Cz32xQtvrTsLInOFauBERERECqMATkRERKQwCuBERERECqMATkRERKQwCuBERERECqMATkRERKQwmkZERESkE8+eeEPdWeiR4fttWHcWpBepBk5ERESkMMUHcGY21sweMrPJZnZg3fkRERERabWiAzgz6w+cAmwBrATsZGYr1ZsrERERkdYqOoADxgCT3f1Rd38TuAjYquY8iYiIiLRU6YMYlgCerLyeAqxVU15ERESKccYZZ9SdhR6ZMGFCt/edOHpMC3PSGqMn3tGj/c3dW5SV1jOz7YGx7v75fL0rsJa7f6lpvwlA48ivADzUS1lcGPh3L/1bdVD5yqbylasvlw1UvtKpfHPX0u4+rDmx9Bq4p4AlK69HZNpM3P0MoNdvNcxsoruP7u1/t7eofGVT+crVl8sGKl/pVL7eUXofuDuBUWa2jJkNAnYELq85TyIiIiItVXQNnLtPN7MvAdcA/YGz3X1SzdkSERERaamiAzgAd78SuLLufHShrB6iPafylU3lK1dfLhuofKVT+XpB0YMYRERERN6LSu8DJyIiIvKeowBOREREpDAK4ET6GDOzuvMgIiKtpQBO3jPMbAEzG25mi9Sdl1Yws6EA7u4K4sqi4yUiPaUATrrUly4qZrYicBZwMbCtmc1fc5bmKjNbGLjZzA6HvhfENcpiZh8xsw3NbIG68zS3mFk/z9FkZraQmQ01swXrzpfMmT76+/ugmS1ed35awcz6152HnlIA10OVL/K8dedlbqqUaykzWz4DnuK+0J0xsw8DNwKPAz9w95+6+6u1Zmrum5co4+fM7FvQd4I4M7MsyzbElEFrA+9YVqZEGbzNyOcHA78E7gXONbMda83cXNQXvoddqZw7VzCzj5nZJmY20PvIFA+V399WxPdzXF+4wTCzfvl3QQB3f8vM1jaz1evNWfcpgOuByhd5c+BsM1uq7jzNDU0/0GuAXwO3EmXcqN7czZk8Rr8DfunuB7r7FZnep7777j4F+B5wEbBfXwrisgyfAC4ADgVOcffHofzjWAnejgb2B34GfBEYAvy8L5xjmmoYFzCzQWY2T2NbvbmbM5Vz53bEufMi4BfA38xsfTMbWG8O51zl2nBRPi5195dqztYcc/cZWZt4mZlta2afAv4KDK45a91W9I+nNzX9UC8CngaG15ytuSLLtSlwIXAysDrwDWAXYOk68/ZuVYKWrYDHgOOr2xsXzpI1Xxzc/V/EBJM/B77aV4K4rO3+PPATdz8dmG5mK5nZccAhZrZyvTmcM2a2GrAlsI27/wqYDqwB7OPu/zKzYidcb6ph/Abx3bwD+KmZjS79d5i/rfWB84GjgC2AjwNPApcAY6DsGkgzGwkcA+zv7icAz2dT/zgzWyf3KbV8w4DnievDpcBO7n5TKTcWxZ4Yelv+UMcApwPfcve3Z2LOTvH/LvxktD3wM3c/NX+wB+brcwDMbLC7v1xj/nqk0nyxPvCmuz/dvE8lKJ8XGO7uT/RqJudANgv/3MxuBe4DrnX3ye7+iJkdDxjwFTPr7+7HNIK4Qpt1BgBLAveZ2dLEzcXywHLAs8B6ZrZtSd/PJvMC87r7X/IG8Vzgm+5+RiN4NbOr3X1yrbl8FyrB23HAnsABRMXBV4E/mNmK7j6txiz2iJkt3cl5YgPi93dmJW0zM7sOONXM1ij82uDAq8CDZjYY2A/YDFgBWNjMPuPul9aZwXfL3e8zsyuB7YApwFuZPqN689Guiogy28gawL15Yh1qZjub2TXAA8D3LTqSF8fMBgEfAW7PC8YtwHXAF3L7Z4BNS7kraTIPcYEEZr5TrAQzxwCf7OV8zanDgA8D6xF3j78zswfM7NtEYPNzYtDGXmb2FZipvEVx9/8AlxM3FfcBiwLnuvuymd4f+E99Oey+LmoqhgIDzWw8ccwOdPef5rYVgI2A4ppSK33D1gE+DXw6bwifIgLyQ9x9WinnFTP7NPCYmW3StGkZYNnKfvPk028T39XRvZPDucvMVrHoC/0/4P1EE/8jwGrAr4hyXUN5587m3+HjwL5EH+LvmNlu0BHE9X7uuq+tM1e3yglorJmNJe72P2pm+xNf4HHAP4Ejga8BK9WV156olGswgLu/CdwD7AE8TPQZ+3LW2vQnmnfWo7Aa2/zx3Q+snP0WqZSpsc+8wGLAf+vJ5bu2PfBnYCARaH8NuJq4M76JaAr/FBHY/NjM9qgpnz1S+W6ukH2IPm1mg9z9SGATYJy770CMJgZYhDh289WT4+6r1oCa2Y5mthOAu19NNLmdA3zX3U/JfeYnbi4aA1Tanpl93sxWgpluGIYA0939VjPbnmiqOtDdT88yfr6Qm987iGbRX5nZxpX0q4F5svYUd38j02cArwOv9Wou55CFRYkBC1u5+/8BexP9w44GJrj7idll43Wi5qoYlZaXDc3sm8CN7n4a8GPgH8D+ZrYLvB3EbZ3dHNqPu+sxiwdRPf4SsBPRbHMsEbSdBnyUjvVkbwc2rTu/3ShPI79jgZ8Cn8jXuwFPAHcDQzJtAPGDnQIsX3feu1G2pbIcXwZWyLQVgGnEBfBjTfv3Aw4HHgSWqjv/3SjfKGBrYL7GsSRqpB4ANqjstxrRXHU5MJkIcFaqO//dKF/ju7ktcSJ9APg7Edys3bTvskRw8xLw4brz3o2y9as8XyW/czcBW2bax/JYPkoE5N8Crs3yD2z+jHZ8EDcPM4ibh+Ur6bsAdxG1cC8B+1a2jSGC8TF157+bZRwCnE3cGP2/TPsgcDNwFbBDps1D9Il7gOieUXveZ1Mu6yTtuDx3LpavB1a2zZu/v2cb59oSHpVzzHbAc8CpwJqV7aOJPu73ETX+hxLNqiPrznun5ak7A+38IAYpfBs4uCl9SNPrY4iq5cXrznM3y7Ut0afhoOrFjwhOJxFB3HnA7/NLvkbdee5GmVbNi9/dRCD6dCNoATbNIOYu4OtErc12RBD+IrB63fnvZhl/mBfIHcggLtMnZpk3AeZpes+iwEJ1570HZdwwL457EQH2mCzzfpV91iWmE5kErFZ3nrtRJqs8P4LoR/sQ8AZwJx03UaOI2qlJRBeGnwADctuAusvRzbJOIG74fkIGccD8+ducAexe2Xc+4A/AZbR/cNov/65F3CROz3PjRpm+GhGQP0I0yd0AvFDCubOpnMuRAVu+vgn4LTB/JW0vIsj5V2nly/xvCLwM7NWU3gjuViYqNx4G/gZ8pO48d1mWujPQjg+iZuNDRNv/k8CXmg9yPv94noz/XcoXOcv1GFEN3tn2HYBTgN8Q/axKqHlbjQhIjwEWypPsP4BtK/usRQR3r+aF5LG8eKxSd/57WNZTsgyf4Z1B3KNEEFfExb6pXI2T5xHACfl8JBGYntrJfltQQK1pUxm/SdwwrEvUIG5IzPl2E7BFZb9hzFxj1/bHsym/ewP/RwRxH8q0bYjA7nriBnJ3OmoYBzR/Rjs+iJu+fwPfJ0ad3kfcbGya25cmWja+D+wDLFd3nntYvg/nufFy4IuV43YLsHVlv49kGUfVnedulGmXynPLx8nAWZk2hKg5/iXRRLxNpi+Yv8OF6y7DLMtXdwba+QGckF/os4GhTduWBr5L9Bdbue68dpH/dZpP/kSH6IeIav/GxfAd1eelPPKk8yJwbFP6zXn8zqCjWWPBvHBuDiwODK47/z0o54DK89PpOoh7KIOb/nXn+V2W8zdEs/1ixM3T6ZXv6S7AEXXnsQdladTaGDHQ4krgpEZa/v0YUWMzsXqRrHxGMb/N6neOaAb+P+KGY2SmrU/0I3s8L5bn0tE83NZBKrAE0XXmgEra8kSfuFfImriSH8RN7mTgj/ldvYrogvJX4LdN+7b18co8rpLfwaWb0o8jbuY3IZrv/5Dnnd8QNXOL1J33bpex7gy0+6MSxO0LvL9p23Bgwbrz2Eme++WXc2pzkEI0cbxcubhUA4O1yOacUh5EAD2DqNXon2kHAW8STcA35vZD6s7ruyjbksRghcFU+p/ktjOJztE7MnMQ9wgxIGX+3szrXChrvwxyfkSMbHucmMamEQD1A04iOvovUP9LzyAAACAASURBVHd+u1Geak396Px7Bx0B3IDK9/ULRE3O1cDmdee9p8ets+f5el+iK8MpwLKV9KWA99ERxJYQDKxINIluVi1vpj9IjKzdpO589rBMjf//IZW0Q/OYLUUEp5cS3Wlm6sZQwiOPz+B8vlqlvFsTtYz/ISYH3zzTlyOaTNu+ZvHtMtadgXZ4VA7smDzpHAR8prL9JxkQfJGmIK5dH8AIYlLCxgmz0UwxMn+gP+7kPT8m+lnN01v5nBvHjhhA8ggxAfHBRDPHWDru7r+fwU4xTW7EHf9zeeL8J3F3+DXgo5V9jiFGgY0D3ldJX6bu/HfnuOXfRYgO0Qvk61WJG49/NS76wAJZ1qeBFevOe3fLls9/RPR160fU2L9C9tujI4CbAFxB9NG8kEJq3ZrK+QXipuJkYM9K+pfyuL3dnNrVZ7Tjo/I9HUY0dx/IOwPVXxMd3Z+ivBundYib4K9W0n4HfC+fj89jOoOotWr7m6dOyrhI/u5+V0n7QPO5hJiOaSJNrW3t/Kg9A+3yIPo3vJAXyj8S/TUurGw/kegIv3/1Ytluj7xQzEv0ATuL6BPWqEEcQHQoPoa4azyZmN9ntUybRps2BzeVsbPRmPfkhXIa2Z+ocvLdjWhaLGKQSeb5Q0R/oUlELeIheYH4FzGy7cdELeufiH5iO5V2cs1jeH9eGC9ofPeIJu7/EjVWdxB3y09TSD/TpmN4PrB+vl6OaJqaDKxJ1DguQHTi35XoGzaDgkYM5/Mj8gJ5PjEo41HgD5Xt+xDN4T8Hlqw77z0pWyWtX35HH6BSC5fbziD6ihXT9FbJ+0eJGraHiSbTVfK7eA7ZeZ+oLd2ZTgLwEh75O9uWuLH/ZSfbVyeu79MoYFDUTHmvOwPt8CAm6H0a+EK+XoUY7n4KMzcRnEvUigypI589LNOBxBIhHyKCs1fJgQtE0+8RGRC8nj/eB0u5QNL1aMwbiNqbDZh5yPsPiDnT2q65u5OyLU4s/7UJ0cn9YuAvwCeIAHzFPNn8nmgKvzf/L/5OIbXDWc5V8/t5QB7P6/M7uGpuX4kYAX4q0Sl+2Trz+y7KtwsRfP6Vyihg4P8RNRzTiU7wj2S5BxG1IY8AI+rOfw/KuQJxw9uYUmNe4mbiUWLNzMZ++1PGaNPGTd/6xKj8s4iWl3nyGN2Sx+so4qb/5Lx2tH2td7V8+bzRKjMvEcj9DbiNqEn9G3BU3fmd0zJW0uahYxqbasXMGCIAv7lx7inpUXsG2uFAE9XE1+fzZYgajdMq+61bed72c/pkPscQd/pfydc/yIvG3vl6HiKQG0fUBixad557WL6uRmPelRePTfL1oUTtQAlzha1M1CReAZyXaR8jaoXvJgdjVPZfg5hk+bxCyle9eKxLjjbN1xtlIPBPcloXCugbNYuyHkzUnj5PU80M0adxO2Kuty9ULqQnEDVYRUz7Qtwk/pW4cVqkkr5ABj0PkP3/qsef9g/itiX6R/2B6Jf4JlFzOooI4n5GNLU9mb/LUm58G///GxPNhTfkcRpT2ecQ4qZxOnFj+I6BNe38qJRxA2LJvQPo6J7Rn86DuDWaf6OlPGrPQE0HudE36v3595tE35OhdIx8a3Ty/2gGCytWvyDt9KCL0aRZpsmV19/LH+YEmjrFl/Kg+6MxHyCGhv+XykSN7fogapymEiMwm0c8r0UEcRPJfo2lPZh51OVeRO3GyU37bEgEcQ+UclHMfHcakBA1h48TtaWNyVA7qx0YmeeYqRRUC5CBwHSi/9eGTduWJfqdNt90tN35syl/y+U14IuVtDWJm/oriKbUfsSI9qUooFa/qXzbZABzMnHDcB9x07hCZZ+ViCmkXqSwqVAy/5/K68LdxETDj9IxnU0jiPs3lWb+Uh+1Z6CXD+zISiC2HdF0Y8RabjOIJqkfNL3nx8RdWNvdFdMRZDZP3toIUNck+vJVOxUfm1/uL1NIDQfvbjTmpDymbT9JLzEX0XXAKU3p1WkZxhBB3J3EclLV/dr6oljJ57ZEQP1o/taepWlgCXHnfHuWc1C7l42Zu1iMIWoWN66k7ZNluZis5Wbmm5CFiEDvSto4eKPrIHUdou/p5VT67hG1+zPNxVjCg46pNJbNa0NjoMlHiXlBd6s7j3NQtlFEDXejK828RJD2vS72L2aapcxv4ybxp0Sr2iAiGL2eCMo/nNv75/XkCWCJuvM9R2WuOwO9eHAXIGaUfgL4al7cd61s/wFRVd4IFJYiqpmn0saTvRJNvjcC+zV/GYm7xDuAXzWln0Q5ffnmZDRmEaNO82LxCNF5v9MO1Pl3/QwEJpMTTrbjo3IitcrzBYlapvH5WxxHBGp38c55mtYr4dgxc/B2FNE3alKeM34DfDC3fSV/h7/s7IJB1Py3bU1OUzlXJZq7lyNrivP1m0Q/0y8QN8S/z/+LouYjJAK1GcB6+bo/HUHcbcBxdeexB2VpbpFZA7gzny9PBDVnVLZ/nEpXms7ORe34YOaRwgvnOXKdyvYliRvkKY1reR7Xth2M2O2y152BXj7Q6xCdvacTiylDR/+TFYkmucYs/XcTd5Bt3ZRDdCK+jKiFupfon/F21T5RnfwKueZi5X3D6s57N8v3XhiNuQPRDDXLNS+znB8nRsO1badpmkYyE7Uad+ZxXLGSviXRefhuCgjYZlHeA4kmmXXy9QF5Hlm/ss+XiebUYiYiznxX+y0eR9w8vEzccNxENrERNaevZbnPIW4SGzcebRnEdRagEMH0tUQA2qixaczgfzPwzbrz3c2yLU/0/12wkrZtnjc/SNSCn1E5RqsSg/TWqjvv77K8W+d38xZiNOnGTdtHEPNLvkYBo7y7Xe66M9ALB9YqX9IliKCsscbZOw5knoh2I9bPXKw38zqH5fwQ0SQ8maix+k2WoREAHZ37teXJtJPyvCdGY2ZZ1yKaoT4zi312Av6Sz9t2nj6ihm0S0SzcqLnYlQjSXqJphCURxN1A3DS1/RQTmedxwPGNY0GMKt05X2+fF5DGiPYFmt5XxO+vkzJ/nQhSN80AYFfihmoyHTWNY4ia8LPIyVBp01ocOmpt1iX6ZO5V2fZZonb46izvaCJ4nUohfcKIfsEziC4zjclsBxM1wdOBc5r2Py63lXTNaxzDVYkpwA4lbvDvIVprmrtnLE1UdhQzUe9s/w/qzkAvHuzGSWZJohbjurzQNOaealxs5q07r3NQxv7AQKJG4Kr8AV9IDHN/gTZf161Sjj49GrOT8i6ex+j35LJDnexzLFGz0ZbfT2Zu5v1APl8k/w7IC8rDRDPUwk3v3ZboA9a2tYqVvA4iapceIpb7WpgYaboB0an/FTqCt4F5Ydyq6TOKCeKIG+D3EYNLDmzatl4GOmeQNxV5/N8kmovbetJlojP763mueZMISJfJbTvmd3JGHutiplmqlG8Xomb/eGK+z35Ec/6DxE3Hkvm9/T5xE9y2fTBnUcaPEksHHpqv+xGtUncSlTVLN+1fzG+vW+WvOwO9dJCXyR/iUZW0LYgg7n6yJo4Y1n88cVfdlneOsylntbljINF8eiGxHtwMCmimoo+PxpxFubfNi8kFVPpcEjVZ3yOajdtyIk06grdVyBGXxOSYT9KxDm1/ohbxTqKv1AeaPqOY/ihErfBrZAd9Yib+i4kaxgmV/RbKwKeIZrdZlHcAEXg3ah2r55kTiNrvt8+ZRGDXWEO67Ua7E0HpAOAiYA+if+bKxA3GXXQ0C/fP7/SKFHDzW/n/rw5+Gk8EcT/I8sxDNOc3gtYHialgiprANsv2fiK4ngGc27RthbxO3E8BN4bv+v+g7gz00oFeiKhefQM4rJK+RZ5gXyTWfXuLAkYtzqaszR1XFyBqeJauO2/dyPt7YjRmF2XvR4xGfJOo/r8gL4BXEJ1v2/LuvxK8rU40zRyZr0cTd/kPk3NJ5QVk5zx211FIP8wuyn0JUfvUnxhUM4NYN7LR93QYMY/YLRR019/ZbyjLeFketyFN2/YggrshTd+HdWizG45KgDMkrwmnMPPI2SXz+zoxg7a2nq+uqWwfJJq5P0LTgJg8Rm8R/YX7EwHsIGJS6RG04QwL3T2exE39zRnILdq0fXmiif92Cplxocf/B3VnoFUHtpO0oUTT4oymIO6jwOFEB84+07lxVv8X7fqgj43GfJf/B2OImoE7ib5hh9GmqxBUjseHiKlpvt20fU3gF0Tn/WoQt2NeKK8o7CJZrXnaA3iGDKyJGo4XM2D7I1GrcRcdA1PaPohj5tGmKxDTTjRaJ0ZmeX9D9CWen7g5vB64uKvPabcH0dn9HiLofIl3dnZfkhgc9U8qc6O184O4WXg8r22vNY4JUas/Ivf5JHGDdWzJAVvlefWmfo08n9zRXDZitHSfrYFr3JH0OWa2FtEEd3UlbSjRYfU44LvufnRl2wB3n977OZUGM9uBCF7mdff/mVk/d5/RyX6bECejPYFD3P2xXs5qS3VV7nbSyKOZrUL0HXrD3ZfIbfO4+xv5fDSxjNLHiAWzf2dmA4gL6V3tfuzMbAVgqrs/n6/NG1cTsweAB919u3y9I1FzsxDRdHOOu08v4dzSVK7DiO4XQ4D5iJHtxxE3uxcTc/n9h2jRmI+YKPt/1c9oR3lNuIKo3f4Pcf54GBjv7k9U9luaCFS3b/fvJ4CZfYA4PmuQqwwQNd0j8nEDUeP9IWB3okvGD939xVoy/C40vltmtiExkO1DxLG83d3vMbPViRrwacBYd59aY3Z7T90RZCsexAn0amKkafPCwx8gTkIzgMPrzqseMx2bPjMacw7/H6yz5+3yoKPmbTWi5u1Goh/UpZVt1bVoRxM1cQ/P6ti224MY3TaDaKI5jOhzM6iyfd8s00dn8RltX/PWlN/DiNGmGxGT8Z6Z/weNKTWGEH2FDyfm02xMw9TWTVTEBX9H4sa9kbYK0T/4WpoGD7V7eSr5bFTCLEKsrnAbcFCmDSKC1OOJ/qh35rGcRlMf1BIeRI3iq8CviO4JTxGB6ady+xrEwMR/0tR/uq8+as/AXD7A1QvfJ4gO1bcSEXl1v0PomG5jWDteJN+LD/rAaMz3yoPo8zajcUEkJnB9MIO4xkWlGsStSdwx30eMamz73xzR3Ls1UQP1YgZr59DRrLhYBjsH5Ou2L9NsyrsIMbfbJ/P1Vsw8Jcp8XbyvbYNUom/pUKLGbQbwk6btjSDuKnKmgtIeld/bosTUSpNoqpwgAu8PE7XhxXUVIvr4TSbX8s60DYi+mX+iYzaJxpyTfbbZdKb/l7ozMJcO7tsXDGbux7EhURN3C5WaOKIK+YsUtlTIe+FBwaMx3yuPvCh+Gzi2krZAN4K41Slw6Rpi5N4wohP47fn9/AUxCOoQYlLUpevO57so16bE3JEXEnOfDc6AdNksW3VKlHmIm6cuaxvb6VH5/jVWi9iA6Cd2C++ci3Bloub/txRS8zaL8g7PIO4+KrMulP7IY/SvDNCq1/gNiIqYz1bS+mTLTGeP4vvAVdrGNyWqi4cRJ55vu/skM/s4MTpnJSJSH0T071jb3SfXlG3pgpn1I/opnkyccG8n1iBchAgAPuXu99SWQQHAzOZz99fyeaM/3PzEJNj7ERMqj8vf5iB3f7PO/M6JSvmM6Lw/nqid+jjx3Zyf+F7+ob5c9oyZ7UUsQXcnsXLLSsQEvMOJIHxv4FvufnruPyK3/9Ldz60jz92Rx4j83n2KqKH5f+5+i5l9jGh6uw7Y192frbxvReAtd3+4jnzPDZVr4XDgYOL7+Vt3P6zWjM0F2X/xJmADd7+tqZ/tzUSz6Z5eekDTQ8UHcABmtjVRY/NTYuTXd4gagS3c/R9mtiZxwt2SqC7/trvfV1d+ZfbMbAxR3b8s0fzxZ+ACd3+k1oxJpyoXj2oQdx8xX1/xJ5nmDvo5IGpp4LtEzf827v5WXfnrCTP7PDGFxmfd/VdmtijxW9udqOVYAzjJ3b+a+w8haunmBzZp13I2BW87EhN7DwT2cfef5j7rEk357wjiSlD9nbn7q7PYPpzoq7g98FN3P6bXM/sudTUYxsyuIppSN3D3Z/J4G1Ex86eSyji3FB/AmdlixIzZ57v7CWa2MHFXeZW779O077xE9es7vvjSfkoYjSkdmoK4XYAjgKvd/XP15qxnZjWaslLGxt/5gNfzef92DW4askXieqJ57ZBK7eKWxA3wp4kL/2bA5cS8hMsTXRhGe4w2bbtymll/YEYeh+2Jefq2IVYAecbd96+UdV2iufReIoh9vr6c95yZbQGs5+7f7uwcWfluLkasvPAzd3+0lsz2UCXvaxNTKr0CXOnuz5rZOsTE0QsT3TX6EU2o+xAtag/Vle+69Ks7A3PBUOLO8FQzW4K46/9jI3gzs+0rd2avK3grSrXGw+rMiMxeJbB5laixOQg4suZszVY2279tVjWGjW2Vsr5Wed5WQU0XniLmqPuYmW1cufgvTkwJ8py7fwb4EVF7NZiY164xVciAdipnnt8HuftblZq3S4DPu/tldCwvSKOs7n4LMSJ1BaJvX2k2JUYJQ9RAzST/H/q5+9PAd0oJ3uDtvG9NtLh8lmi2/7mZbeTutxIB6SSiafw0Yn67jd6LwRvEciLFMLNliIO6KHC3u59NdD59jejX9gOienzf3H8xYrDCDGJeHylI9ULaF5rhStQ0P9iaxJxoXc6NVQlm/kuM2Gxr1RqMvHCsSFwUb3H3m2b13hK/n+7+sJl9DjgVONzMniH6vf0Y+Jy7T8n9juyk2bi/t9F8dlkTtT8xdcaUTB5OBG9n5+tHiKbuau3OMu5+vZmt6NmPszBTiMCaroLpSrDaNsH2rFSOzVBiqa8vEk3gyxLThhya379rga3MbFVihPSr7v5CbRmvWTE1cGa2GvAXYhb+McCZZnYw0WfjTeKu6xZ337tykvkyUd16Zw1ZFimWmTUuEG5haWJC0AVn995SghnouNCZ2fFE88z6xGCZG81s1zrz1ioeg7f2JSbkvYQYqf95d7/EzPo3aiSbj2MbBgN/JtajnWJmK2fH9hPd/exKjf1rwPJ58fc8zpPM7H3EzX8RzGxE/gYh+nnPyL6L1X36937O5o48NpsT8ws+R3SBesvd/wmMI8473zGzT+b+f3P3J9/LwRsUEsBltP1XYqDCJ4ipJv5M3H0NAr5BBHFDzGw3M9vCzE4hTlK7ufuT9eRcpDx5Y/SzxgUjL+QDgBeAJ2b13hKZ2Xiiz96O7v4JYg1XKOT8+G54jLbcl1ge6zHiRrgRpLVtAN4ILs1soLu/6tGZfSSxgstpZtZoEm20Lk0hvsJvmdnhRN+pj7v7f5r7jrWrrLx4EJhoZn8BTidqFbc2s03N7P0ZsA6sM59zwXDgS8BY8kYxa8j/QQRx8wNHmdlm9WWxvbT9Ccpi+PqfiM7QB7n79Kzmf5lo6hieTR1rE9XKhxJNqcsTw8c12lSkZ/4J7AAcnBdHiMl3XyMmtMXM+pXeL7GS/9WJ9TxvN7NtgTOICUPPM7PBFstp9TlZE7c3EZQfmwMc2rYGtTIIYSTwdTM70cw2JmtsiJVBfpQ1cf/Ltz0MvG5mJxFrYW/o7nfUkP058QAxIGNHou/XH4jf4wHEShkPEvP3nWExgKhI7n4+Eai9D9jHzN6fx9uyJm43YlDDe7K/W2fafhRq/lgvJQK2I939RjM7EDiaGLDwLNEB9wriLvI+4o7yFQ1YEOmZbGp6y8w+TdREnUtc+D5MBDYrec6/1FeY2XnEiMR/EE2K3/SOaSd2JNZxPcQLWjuyJ8xsOWLexWWAnd397pqz9A4289q7vyKa8x9w95Nz+0CiJeazZOuMu79hZhsRFQD/I0Yqtv0ckpX+YAsAZH/S5n2uJfr+nULM4TeS6EJURHBTKeNCxJRf/0fEI9Ozj+aZRL/Mw9z9P5XjP7ASnL/ntf0gBnd/3Mx2IjrdHpL9Uj4FbEeswbgEceL5Vv59hRjuruBN5F1y98vNbD1iHdCXiLv8/wCbm9kgYjWCgcTN02R3v6a2zHaTdT0tzbPEaFkHvubuZ+b+CwK7Ag/31eANoibOzL5G1Ma1XYtFXuxnmNlKRHB2GnEz35jI9UvAEHc/ysycaA4/wcy+loMVDgcudfcHaitEN1UCm08CXwMWM7OXiDJf5e7/NrMBwLzAAu7+DNEMXoxKGbci5lFcjBiQcKOZHePu5+ZxPIvo63eku7+Sb2+bQTTtoO1r4BrMbBQRxK1HROXfa9o+H7FQfX9373P9dER6i5ntQMzSvxLR5/S3RBNNf+JOeSHihGvE3fNm2cTRthoXjXy+JtFn9sVG/1gzuxoYTUzR8G/iAnkyMQhq7awZ6HJ+uL7E2nOetyFE7ehk4MuN/GVrzGHE8TzB3Q/NtB2BvxGr80wv4bhVApstiVkTfkQsQr8Z8CGi/Ce6+/NmdhywgrtvU+L30szGEs3BRxKDMjYmBhC9CIx396fN7LNEv/ejiRrwosrYK7wN1vPq7oMYUnwtsaTGhpX0tl1MWQ89SnjQcTM3Mn9fe5PrQgKbE1PxXEJcSOZvvIcuFjhvlwcxyecaldffI7pYvEr0m9ot00cBtxIDNZ4nmqduItdy1Tmm9uPYWMx8bOW7Oo7o/zaOaD79F1EzB3EDcjOwaN1570bZ5q08H5zfuyOb9jk2y7995fU/KWzt1jxnzEt0z/hR07adiGDueHK9U6Ivrta+7uJRTA1cQ9bEnUKMSPmOu99Yb45E+gaLmc7HEnOhTSCmmZjh0XzVWFfydOD7npODtvPdv5ltSCyZdA4RuK1ILPT9eaJmbRxxU3iyu5+V79maqGl8Fvhrln2At9H8Z+9FWSt8EbFQ+fRMW4RYmP7u7Eu1FxHYbEn0kXuft/kqC2a2BrHs3IEeI2oXBCYCR3s0JVbX/LyGCNg2zve97AUuLZiDh24C7nD3r1d/X2Z2GjDG3desNZOFaPtRqM28Y/j7y8DJ2U9HRObcF4g+KWvRcaG07Dv2e2I94b2B/bIfDu0avAG4+w1E89PmxMSgawPHu/t17n4x0fR2P/AVM/tCvud37v5rd785g7d+Ct7awr+IgQjbNBLc/bkM3szdpwJ3ELWoz3iskNHuwdtqRJ6f8+jLhru/RDQHb5iv36hMjXIruXKEu99TSvDWGO1tZu+vJL9CjP7Go3tCoz/+LcD8ZvaB3s1lmYoL4ODtIG5/ogp5ymx2F5FucPfxwEnAMGIY/0Le0deoEcR9klgcu62DGjP7kpk95u5/Ar5N1LZ9lVh6DwB3v5+ombsPmJAd+WfihcwV9h4wheh3uYt1TGgLzHQTsRkx1c2/ejlvPZbB261EbfYBTZt/AqxrZgdBBHGZ/kHgGTMbVJkCp61V+vVtAVxsZuvk8ToUWCtr3KicT9YDniaOo8xGcU2oVRZr4L1Zdz5ESlM5sS4IvEU0zTTmeDubqAE4ETjX3V+0mOXdSwhozGxv4iK4s7tfmmnbElMT3AV8wyvzQ5rZykQg9wKxnFS5J8U+LI/hL4hppY5z90mZvjAx1c1exCLv99eXy9nLaVvuJ4K3QypTZHydmMrmduA7wKeJ5tQ7ieb/ccA67v73mrL+ruRxOx/4IfAnd/9LjmTfmfid3kv07xtIzDCh+Vu7qegATkS6p2kU5kCPhcm3JJabW5ZYM/IGdz8u9zmXGBX2Y+ACd59WT857JkeunQds4+6/b+pfsyMxsu8PwI8bAUBu+yDwuHdMHKoTY5uxWIXh88RF/zGi+XEG0Z9xFWBrb/N53rIMRxFN+se7+7GZfhARhG7n7n+yWMd7C6JbgxFThRzczsFpZ9P0ZJ/1q4nf28lN2wYAKwMHEwMbphJBbdtP99IuFMCJ9HGV2rYRwH+yRm1L4NfEaL03gBFEDcbp7v6VfN+ZwPbAQUSzaVufLMxsD6KW7T53X6OSXl2w/rPEKLcraQrimveV9mRmY4guNMsSfaluAC5sDKxpdxmcHQBsRNRM9cvXu7r71c03EFlbZd7GE2hXahHXAFZ19/MyfVMi4N7c3R+v7tvJZ2iwUA+1/US+IjJnMnj7AHANsL/FeopfBY5x98MBLGZ9v5NYjucxdz/B3T9vZq8B1xYQvO1FTHZ6DLCHmf0GGOex9F5jMMIMd78wuw8dCww1s2+6+2ONz1Hw1v7c/Q4z27nUY+Uxx9n3iNHOXySm7tnMY9Lht+fgq3xn27qbUCV4W41o8v1+ZfPgfPTPfY1ca9disuJp7v7X3Let5h4sQZGDGESkx6YRJ9LF8vUyRM0b8PZyPb8GLiQ6UDeW8fmyx5qZbcvMvkpMb7KNu38H2INo/r0k++7RCOLy+YXEBKIDiXVApTzVGqoiOvRX5ajTo4nm/IeBdTP9rep3tr4cdk9T8HYrcVN4YGWX+4gm7p0hbiYrN4NbEiPAGyNr2/omsR2pBk6kj8sLwgxiUeyR7v6qmT0JLG5m83suO+ex5uBzxKLgbX3X3+QeYKccJQvwR2JNzAuBS81sB3d/q6km7mfAz0DNpiWqXuxLvfC7+7NmdgzRx237bEI8NIO4tv9OVoK3VYG/El0SvlvZvidRq/8N4AfZ5+03RE3b7sSgjPXbuWm43SmAE+ljmk/+lSaZe4B1MvkqojP1XWZ2qXesHbwQMaChPzHvVttz9z9DR1+/vKjMKoizpgCgrS+U0ndVgjiALfOG6pslfCfzt7Qk8CdindaDG9tyUMbXiWmHTieC1GOIOVz/TdxQbtLcB1V6RoMYRPogM1seWI6YGuN+IhjbC5jg7qvnPj8iRqGeTyxJ9AFiOZuPtfNot+7KprXNgJ8DNxK1dOokLW3HzIYT/TJHEV0B/l1zlrrFzEYS07q8TCz/daPFWrTfAD7r7tdU9l0BWIpY4WWyuz/X+znuWxTAifQxZjYvMRpzC6Im7TliYsx7gF2Add399tz3K8RC0ksCjwKH94XgrSGDuE2JqQyOq9YSiLQTi6XBzN2frTsvloi4ywAACqJJREFUPZHz2p1KtOg9Rszltou7/7Fp+qIx7n5HjVntcxTAifRBOfXA/MD7gJWITtILEUth/Q44wnNCUDObH5hOLCD9ej05bp0cvDAauKvRnCwic0/O93YqsZLCoe5+fGNwSY6CP4aY525x4NlS+y22GwVwIn1UFxNrrkLM9P4nYgHtB2vJXE0015RIa5jZssBPifVaD3P36zP9SGLaog3dfWKNWexzFMCJ9HGViXwbKzCsCVxHLGGzl8fawiIicyRr4k4hav/3BzYgpuxZ193vqjNvfZECOJE+ojtLQDUmCjWztYgh/Wu7+5O9k0MR6esyiDuRGPG+ALF+q4K3FlAAJ9JHmNnwnJZgloFcJYibR3MwicjcliNOjyfWb9VUIS2iAE6kD8i73knAxu7+l26+R4u2i0hLNLps1J2PvkxLaYn0DS8RC7TvYWYLd+cNCt5EpFUUvLWeAjiRAjWG6OfcUeSkmL8H1gY+ktv615ZBERFpKQVwIgXKUaWbAw+Y2aE5ZchZwETg5NznrRIX+hYRkdlTACdSrmHE5LyHAOea2a7A0cAzZvZDUDOpiEhfpUEMIoVoHnSQS2YdRqy28AawBLGW4iRgEWKi3ntryKqIiLSYauBECpHNppuY2fFmNjiXvbobGAGcRcx2fjOwHbGI+/b15VZERFpJAZxIIcxsALAYEaj91sz2cvdLgJeBn7j7M+6+H7Az8HPgwvpyKyIiraQmVJHC5MjTY4HVgBeIZtRTgDPd/dTcZ5C7v1lbJkVEpKUUwIm0qcoapmsCo4H3A3e5+w3Z/+2jwHeJqUNeBB4BdtPSWCIifZ+aUEXaVAZv2wGXAzsSC0NfZ2bfAt5w97+4+2bAQcREvqsSgxlERKSPG1B3BkSkc2Y2GjiDWE/wdDNbGvgkMCSDu37uPsPdTzGzG4BpOaGviIj0cWpCFalZIxCrvB7g7tPN7HPAp9x9OzMbCfwFuMLdv5j7Le3uT9SRZxERqZeaUEVq5u4zzGxpM9s+X0/PTUsAmNmywE3EWqf7ZtpGwPFmtmgNWRYRkZqpCVWkZmY2DzEYYRMzG+juv8xNTxODF24Gfu/ue1fetgWwAOrzJiLynqQATqRm7v6Gmf0MmB84IJtQL3D3s81sG6Lf2+/MbCgwD/A1YHdgA3efVl/ORUSkLuoDJ9LLqn3ezKy/u7+Vz0cD3wSWB05093PN7H3AFcCKRJeHh4DFge3d/Z5aCiAiIrVTACfSixrBm5ktBQwCXnL35yvb1wK+DqwA/NDdz8/0rYj1TZ8AJrn7U72fexERaRf/v727D9WzruM4/v44V8vSFCNBK9cfZjo3iU3T8GGTVWSRyskkJcFwCElqZSEMM2t/9GSoZIoaiTijCEuy1FYuypHL6dqczrYeVuoIEudMK5vu2x/XdfLmcNo5Wzu7zn2f9wsG18Pvvs73DG74nN/TZYCT9rA2vG0CtgOP0Mxxe5BmhekzSd4MXAXMBG6sqps7KlWSNEm5ClXa86YBv6f5/v0cOBZYDGxMsgw4BVhJM1x6QZJzuipUkjQ52QMndaDdGuQeYAPNkOlTwFnAkcAQ8DxwRNt8DXBiVT3fQamSpEnIACd1JMlhwM9ow9vwO0yTHEKzRchHgEOBr1XVY50VKkmadAxwUofaEHcv8Ffgo1X1hxH39+7Z2FeSJMAAJ3WuJ8RtBs6tqj92XJIkaZIzwEmTQBvifgz8G/hAVW3qtiJJ0mTmKlRpD0uSkdeqaiPwQcDhUknSmOyBkyZAklRVJZkDHAVMp9mAd9UYn5teVdv2SJGSpL5lgJMmSJIh4BrgL8C/gPnAoqr6Vpd1SZL6n0Oo0gRIchxwE7Ckqt4FXNreent3VUmSBoUBTtqNeua3HQcsq6obkhwK/BC4vqo+07ab2U2FkqRBYICTdqN6ZU7CQcDT7aa89wN3A58ASLIAuCzJG7upUpLU7wxw0m7U0wO3BTiT5iX1P6qqC6pqe3vvNOBA4MUOSpQkDQAXMUj/hyTTqurlJG9qL/2jqp5p790NnAycQPNi+lcDnwU+Biyoqke7qFmS1P8McNJOSnIq8FRVrWnPh4AraYZNVwDfq6rbkxwO3Aq8DfgbzZsWZgJnVNXqLmqXJA0GA5w0Tkn2otnT7UHgO8AS4FXAcuBLNFuFvB84BPjG8HYhSc4H9gOeAB4Yfmm9JEm7ygAnjUOSvYbnsCV5L3AzzcrSPwP7VtUV7b1ZwGXAHJpVpzd0VLIkaYC5iEEaw3B4SzI3yQbgPmAxcDrwaeD1w23beW1fBtYC5ye5uIuaJUmDzQAn7UBPeDsa+CVwZ1Vtq6pbgU8CM4Bj2ldmAVBV62iGVJ8EPpRk/y5qlyQNLodQpf+hJ7wdATwEXFVVl48YTj0LuBq4C7i6d2Vp+7mtVbW5i/olSYPLACeNoie8zaYZMt1WVQe390Lz3RkOcecAX6HZrPfrVfVYV3VLkqYGh1ClEUYMm66kGTolyR1Jpldje7sqlapaSrO/27uBK5L4vlNJ0oQywEkjtOFsHs12IV+tqiHgPOAk4LtJpvW06w1xXwBmA1u7qVySNFU4hCqNIslJwFBVXdyeTwMWAktpeuTOrKqX23u9c+L2q6rnOipbkjRFGOCkcWrnvr2HHYS4JCm/VJKkCWaAk3bCiBB3H3B2Vb3UbVWSpKnGACftpDbELQTuBW6rqnM7LkmSNMUY4KRd0C5emA88WVUbOi5HkjTFGOAkSZL6jNuISJIk9RkDnCRJUp8xwEmSJPUZA5wkSVKfMcBJkiT1GQOcJElSnzHASZIk9RkDnCS1knw+yaWjXJ+ZZF17PC/Jtbv4/EuS7NNz/pMk++96xZKmKgOcJO2EqlpVVRft4scvAf4b4Krq1Kp6dvdUJmkqMcBJGlhtz9njSZYmWZ/k+0n2SbIpyRvaNvOS/KLnY0cn+XWSjUkWjfLM+Unuao9fl+TbSR5JsjbJUHv9+iSrkjya5Mr22kXAwcDyJMvba711fCrJuvbfJT31r09yU/usnyZ5zcT9j0nqFwY4SYPucOCbVXUE8Bzw8THazwFOAY4HPpfk4B20vRzYWlWzq2oOcF97fXFVzWufdXKSOVV1LbAZWFBVC3ofkmQucB7wTuA4YFGSd7S3DwOuq6pZwLPA0Lh+a0kDzQAnadA9UVUr2uPbgBPGaH9nVf2zqp4GlgPH7qDtQuC64ZOq2tIefjjJw8BqYBZw5Bg/8wTgB1X1QlU9D9wBnNje+1NV/bY9fgiYOcazJE0Be3ddgCRNsBrl/CVe+QN2xjjaj1uStwKXAsdU1ZYkt4zyM3bGiz3HLwMOoUqyB07SwHtLkuPb47OB+4FNwNz22sghydOSzEhyIDAfeHAHz14GXDh8kuQAYD/gBWBrkoOA9/W0/zuw7yjP+RVwejs/77XAGe01SRqVAU7SoPsdcGGS9cABwPXAlcA1SVbR9Gr1WkszdPoA8MWq2ryDZy8BDmgXHqyhmd+2hmbo9HHgdmBFT/sbgXuGFzEMq6qHgVuA3wArgZuravWu/LKSpoZU7dTogCT1jSQzgbuq6qiOS5Gk3coeOEmSpD5jD5wkSVKfsQdOkiSpzxjgJEmS+owBTpIkqc8Y4CRJkvqMAU6SJKnPGOAkSZL6zH8A95BIpjR2yWwAAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "# observation \n", "- From the above chart, we can see that most of articles contain health and technology articles published by 10 publishers. Their name is as follows:\n", " - Reuters\n", " - Vice\n", " - CNBC\n", " - CNN\n", " - The New York Times\n", " - The Verg\n", " - People\n", " - Gizmodo\n", " - Fox News\n", " - Economist\n", " \n", "- Reuter take the first position by far publishing Tech and Health articles\n", "\n", "- Economist take the last position publishing on this domain.\n", "\n" ], "metadata": { "id": "kGxYry7F1rtz" } }, { "cell_type": "code", "source": [ "data_tech_health['tech_health_tag'].value_counts()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wpMwPpLLxfof", "outputId": "9a749655-8c0b-492a-b590-7612c664aa85" }, "execution_count": 21, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "health 65261\n", "technology 60849\n", "Name: tech_health_tag, dtype: int64" ] }, "metadata": {}, "execution_count": 21 } ] }, { "cell_type": "code", "source": [ "def plot_figure(column_name1,column_name2):\n", "\n", " plt.figure(figsize=(12,8))\n", " publication_plot = sns.countplot(\n", " data=data_tech_health,\n", " x=column_name1,\n", " hue = column_name2,\n", " palette='Set1',\n", " order = data_tech_health[column_name1].value_counts().index\n", " )\n", "\n", " plt.xticks(\n", " rotation=45, \n", " horizontalalignment='right',\n", " fontweight='light',\n", " fontsize='x-large' \n", " )\n", " plt.show()\n" ], "metadata": { "id": "ekb6pRbs4IBB" }, "execution_count": 22, "outputs": [] }, { "cell_type": "code", "source": [ "plot_figure('publication','tech_health_tag') " ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 599 }, "id": "VRKHxRt8L1t9", "outputId": "87cebadb-7826-4cf2-efd5-bab61a326172" }, "execution_count": 23, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "# observation \n", "- From the above chart, we can see that most of articles talking about health news coming from Reuters. The other publisher contirbute much less when compare with reuters.\n", "\n", "- Regarding to Technology articles the 3 publishers. i.e Reuters, Vice and CNBC contirbute almost equally.\n", "\n", "- Economist,FoxNews and Gizzmodo contirbute much less when compare with others.\n", "\n" ], "metadata": { "id": "ilaEiJGGDgJK" } }, { "cell_type": "code", "source": [ "data_tech_health['year'].value_counts().sort_index()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "l2wE9-1Nsg8n", "outputId": "9c371e04-c8d8-42cf-e1ad-6826e4b80964" }, "execution_count": 24, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "2016 24470\n", "2017 28697\n", "2018 24770\n", "2019 22961\n", "2020 25212\n", "Name: year, dtype: int64" ] }, "metadata": {}, "execution_count": 24 } ] }, { "cell_type": "code", "source": [ "data_tech_health['month'].value_counts()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2OWkD_q8tt8T", "outputId": "09795ae8-6c8a-473d-a908-562a3d872829" }, "execution_count": 25, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "3.0 24510\n", "2.0 15218\n", "1.0 10921\n", "10.0 9050\n", "4.0 8875\n", "5.0 8789\n", "11.0 8643\n", "6.0 8577\n", "7.0 8190\n", "8.0 8094\n", "9.0 8055\n", "12.0 7188\n", "Name: month, dtype: int64" ] }, "metadata": {}, "execution_count": 25 } ] }, { "cell_type": "code", "source": [ "data_tech_health[data_tech_health['year']==2020]['month'].unique()\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "b6b8D6ocfHqy", "outputId": "9ab1feb7-79ff-44e6-89ea-771e3ae3f966" }, "execution_count": 51, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([1., 2., 3., 4.])" ] }, "metadata": {}, "execution_count": 51 } ] }, { "cell_type": "code", "source": [ "plot_figure('year','tech_health_tag') " ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 523 }, "id": "jVfz6YIP4HLH", "outputId": "a5a78caf-71d5-4f1e-c4bd-931978d75835" }, "execution_count": 26, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "code", "source": [ "plot_figure('year','publication') \n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 523 }, "id": "609Chb1HtW0V", "outputId": "f871a1fa-9de7-4e40-d949-b28d6ab56de6" }, "execution_count": 27, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "code", "source": [ "plot_figure('month','tech_health_tag') \n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 519 }, "id": "Hi1VzU_yVVzT", "outputId": "a21ba601-b125-451e-b86e-73db40b67a63" }, "execution_count": 28, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "## Observation\n", "- The highest number of articles published in 2017\n", "\n", "- The highest number of health article published in 2020. The reason for this might be related to beginning of Covid pandamic in 2020.\n", "\n", "- 2020 has only 4 covered months record.\n", "\n", "- The highest publication of articles both in health and technology occur in month of January, February and March.\n", "\n", "- The first time the number of health article greater than in technology article is in year of 2020.\n", "\n", "- Reuters is the leading article publisher in in technology and health articles.\n", "\n", " \n" ], "metadata": { "id": "mV8UL9XPmexA" } }, { "cell_type": "markdown", "source": [ "## Data Cleaning" ], "metadata": { "id": "11pAAgaBDtfe" } }, { "cell_type": "code", "source": [ "def processed_text_article(df):\n", " special_char = list(punctuation)\n", " for e in ['.','?']:\n", " special_char.remove(e)\n", " special_char.append(\"\\n+\")\n", " special_char.append(\"\\s+\")\n", "\n", " def deep_clean(text_str):\n", " text_str = str(text_str)\n", " text_str =text_str.strip()\n", " text_str = re.sub('<[^>]*>', '', text_str)\n", " for char in special_char:\n", " text_str = text_str.replace(char, '')\n", " return text_str\n", "\n", " df['article'] = df['article'].apply(deep_clean)\n", " df['title'] = df['title'].apply(deep_clean)\n", " return df\n", "\n", "def clean_dataFrame(df):\n", " missing_cols = df.isnull().sum()\n", " drop_missing_cols = missing_cols[(missing_cols > len(df)/20)].sort_values()\n", " df = df.drop(drop_missing_cols.index, axis=1)\n", " df['date'] = pd.to_datetime(df['date'])\n", " df = df.dropna()\n", " #reset index\n", " df = df.reset_index(drop=True)\n", " # make all columns lower_case \n", " df.columns = df.columns.str.lower()\n", " df = processed_text_article(df)\n", " return df" ], "metadata": { "id": "MeGBdMriVbb2" }, "execution_count": 29, "outputs": [] }, { "cell_type": "code", "source": [ "data_tech_health = clean_dataFrame(data_tech_health)" ], "metadata": { "id": "ndoFFfu5tkq8" }, "execution_count": 30, "outputs": [] }, { "cell_type": "code", "source": [ "data_tech_health.info()" ], "metadata": { "id": "4Mx1D_Ouu4Au", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "2749820e-c3c4-4eb7-babd-73368f574f1d" }, "execution_count": 31, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 125948 entries, 0 to 125947\n", "Data columns (total 10 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 date 125948 non-null datetime64[ns]\n", " 1 year 125948 non-null int64 \n", " 2 month 125948 non-null float64 \n", " 3 day 125948 non-null int64 \n", " 4 title 125948 non-null object \n", " 5 article 125948 non-null object \n", " 6 url 125948 non-null object \n", " 7 section 125948 non-null object \n", " 8 publication 125948 non-null object \n", " 9 tech_health_tag 125948 non-null object \n", "dtypes: datetime64[ns](1), float64(1), int64(2), object(6)\n", "memory usage: 9.6+ MB\n" ] } ] }, { "cell_type": "code", "source": [ "data_tech_health.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Sm1ZfMwHPA2r", "outputId": "be9234ec-6144-4265-a98e-b99824bc8821" }, "execution_count": 32, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "date 0\n", "year 0\n", "month 0\n", "day 0\n", "title 0\n", "article 0\n", "url 0\n", "section 0\n", "publication 0\n", "tech_health_tag 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 32 } ] }, { "cell_type": "code", "source": [ "data_tech_health['date'].describe()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_AQ7hTDkKpa2", "outputId": "6a78d37b-dcc9-483e-d25a-4669c4f2e0f6" }, "execution_count": 33, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "count 125948\n", "unique 29195\n", "top 2020-03-16 00:00:00\n", "freq 1021\n", "first 2016-01-01 00:00:00\n", "last 2020-04-01 05:00:32\n", "Name: date, dtype: object" ] }, "metadata": {}, "execution_count": 33 } ] }, { "cell_type": "markdown", "source": [ "- The statistical summery of date tells us the year 2020 only includes 4 month record." ], "metadata": { "id": "FqgBc6kHLEt7" } }, { "cell_type": "markdown", "source": [ "## The distribution of word count of article text" ], "metadata": { "id": "kC6tPSqyMxY9" } }, { "cell_type": "code", "source": [ "data_tech_health['word_count'] = data_tech_health['article'].apply(lambda x: len(x.split()))" ], "metadata": { "id": "Ncq5ZSsYM5Wo" }, "execution_count": 34, "outputs": [] }, { "cell_type": "code", "source": [ "data_tech_health['word_count'].describe([0.1,0.25,0.5,0.75,0.95])" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eABFHtEiNiU1", "outputId": "f3e2b1de-d2c9-4895-ff65-35a288f062af" }, "execution_count": 35, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "count 125948.000000\n", "mean 467.384428\n", "std 502.197501\n", "min 1.000000\n", "10% 42.000000\n", "25% 113.000000\n", "50% 351.000000\n", "75% 629.000000\n", "95% 1340.000000\n", "max 13510.000000\n", "Name: word_count, dtype: float64" ] }, "metadata": {}, "execution_count": 35 } ] }, { "cell_type": "code", "source": [ "data_tech_health[data_tech_health['word_count']<10]['article'].count()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "74Wk77BON7gA", "outputId": "084b25f8-fde4-440e-eca1-4752839553e3" }, "execution_count": 36, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "13" ] }, "metadata": {}, "execution_count": 36 } ] }, { "cell_type": "code", "source": [ "data_tech_health[data_tech_health['word_count']>1000]['article'].count()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "S09nSgudRbLX", "outputId": "5cc10859-b0ca-4e63-dcf6-4a2c7ecf887e" }, "execution_count": 38, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "13944" ] }, "metadata": {}, "execution_count": 38 } ] }, { "cell_type": "code", "source": [ "sns.histplot(data_tech_health['word_count'],\n", " bins=100)\n", "\n", "\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 297 }, "id": "9F2gx419NGqZ", "outputId": "a1d075ac-90ed-44bd-eccd-24bf58eba638" }, "execution_count": 39, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 39 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "- The word count of an article text is between 1 and 13510. \n", "- 75% of the data word count is 629. \n", "- There are 13 articles their word count is less than 10 and 13944 article\n", "their word count is above 1000." ], "metadata": { "id": "xR6u2QloSQ8O" } }, { "cell_type": "markdown", "source": [ "## The distribution of top unigrams before removing stop words\n", "\n" ], "metadata": { "id": "D8Q8WBy1TYpb" } }, { "cell_type": "code", "source": [ "def get_top_n_words(corpus, n=None, language=None):\n", " if language=='english':\n", " vec = CountVectorizer(stop_words = 'english').fit(corpus)\n", " else:\n", " vec = CountVectorizer().fit(corpus)\n", "\n", " bag_of_words = vec.transform(corpus)\n", " sum_words = bag_of_words.sum(axis=0) \n", " words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]\n", " words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)\n", " return words_freq[:n]\n" ], "metadata": { "id": "AyF-kwr-TXyM" }, "execution_count": 40, "outputs": [] }, { "cell_type": "code", "source": [ "common_words = get_top_n_words(data_tech_health['article'], 20)\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['ArticleWord' , 'count'])\n", "df1.groupby('ArticleWord').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 889 }, "id": "DGR3SeHub3Jm", "outputId": "bf5b1850-0a08-44cc-b096-7bc4eb180618" }, "execution_count": 41, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "the 3020552\n", "to 1750126\n", "of 1482657\n", "and 1435060\n", "in 1171664\n", "that 769151\n", "for 621677\n", "is 549682\n", "it 503883\n", "on 496834\n", "with 415156\n", "said 399219\n", "as 350386\n", "by 309276\n", "are 300474\n", "be 288955\n", "at 283582\n", "was 281389\n", "have 280537\n", "from 274870\n" ] }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## The distribution of top unigrams for article after removing stop words\n" ], "metadata": { "id": "pV2AVfGTc7dy" } }, { "cell_type": "code", "source": [ "common_words = get_top_n_words(data_tech_health['article'], 20, 'english')\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['ArticleWord' , 'count'])\n", "df1.groupby('ArticleWord').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 889 }, "id": "MYRCv0MiXKEC", "outputId": "e9fc179d-ef2e-404d-b2b9-f34b601ac0e7" }, "execution_count": 42, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "said 399219\n", "people 160781\n", "company 149141\n", "new 147966\n", "like 123928\n", "health 103557\n", "year 94077\n", "million 83302\n", "time 81934\n", "just 80520\n", "reuters 71136\n", "percent 68678\n", "says 68267\n", "years 68061\n", "according 65309\n", "data 64223\n", "use 61889\n", "companies 61447\n", "study 58246\n", "told 57231\n" ] }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## The distribution of top unigrams for technology article after removing stop words\n" ], "metadata": { "id": "EUj94qt7fwQ8" } }, { "cell_type": "code", "source": [ "data_tech = data_tech_health[data_tech_health['tech_health_tag']=='technology']\n", "common_words = get_top_n_words(data_tech['article'], 25, 'english')\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['ArticleWord' , 'count'])\n", "df1.groupby('ArticleWord').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 976 }, "id": "DhEHXucvXaf7", "outputId": "259286a2-83ec-453b-b5c8-8c9744f8ee0b" }, "execution_count": 43, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "said 196930\n", "company 103870\n", "new 88415\n", "like 75975\n", "people 67400\n", "year 54580\n", "facebook 52901\n", "companies 50118\n", "data 46365\n", "time 44805\n", "just 44216\n", "apple 43290\n", "million 42103\n", "google 39642\n", "technology 39362\n", "years 36886\n", "billion 36323\n", "use 34937\n", "amazon 34818\n", "according 34442\n", "make 34291\n", "percent 34253\n", "told 33992\n", "users 33558\n", "business 29491\n" ] }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## The distribution of top unigrams for health article after removing stop words" ], "metadata": { "id": "-8fg2LC5gh87" } }, { "cell_type": "code", "source": [ "data_health = data_tech_health[data_tech_health['tech_health_tag']=='health']\n", "common_words = get_top_n_words(data_health['article'], 25, 'english')\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['ArticleWord' , 'count'])\n", "df1.groupby('ArticleWord').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 976 }, "id": "9R9SORKxgA5r", "outputId": "dabede22-abd4-40a0-8435-5e5b0bedb33b" }, "execution_count": 44, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "said 202289\n", "people 93381\n", "health 92199\n", "new 59551\n", "study 49691\n", "like 47953\n", "reuters 47870\n", "says 46654\n", "company 45271\n", "patients 43098\n", "million 41199\n", "year 39497\n", "time 37129\n", "just 36304\n", "percent 34425\n", "drug 32606\n", "disease 32016\n", "years 31175\n", "according 30867\n", "women 30163\n", "medical 29956\n", "coverage 29808\n", "source 28869\n", "cases 28597\n", "care 28080\n" ] }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## The distribution of top bigrams for technology articles after removing stop words\n", "\n", "\n" ], "metadata": { "id": "FCniAY7Qhp3O" } }, { "cell_type": "code", "source": [ "def get_top_n_bigram(corpus, n=None):\n", " vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)\n", " bag_of_words = vec.transform(corpus)\n", " sum_words = bag_of_words.sum(axis=0) \n", " words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]\n", " words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)\n", " return words_freq[:n]\n" ], "metadata": { "id": "QFwx-V00gzfh" }, "execution_count": 45, "outputs": [] }, { "cell_type": "code", "source": [ "common_words = get_top_n_bigram(data_tech['article'], 25)\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['ArticleWord' , 'count'])\n", "df1.groupby('ArticleWord').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 976 }, "id": "oMMy6ktQiUSO", "outputId": "2d191759-1918-427f-9c0a-f93903b583ba" }, "execution_count": 46, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "new york 13277\n", "united states 12400\n", "social media 9574\n", "chief executive 7421\n", "company said 7087\n", "said statement 7086\n", "san francisco 6745\n", "silicon valley 6236\n", "tech companies 5544\n", "artificial intelligence 4896\n", "years ago 4655\n", "told motherboard 4561\n", "told cnbc 4364\n", "declined comment 4233\n", "law enforcement 3988\n", "climate change 3842\n", "wall street 3691\n", "vice president 3613\n", "said company 3549\n", "earlier year 3447\n", "york times 3374\n", "told reuters 3276\n", "companies like 3206\n", "blog post 3043\n", "stories day 3024\n" ] }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## The distribution of top bigrams for health articles after removing stop words" ], "metadata": { "id": "EdeAdgQtkWmB" } }, { "cell_type": "code", "source": [ "common_words = get_top_n_bigram(data_health['article'], 25)\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['ArticleWord' , 'count'])\n", "df1.groupby('ArticleWord').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 976 }, "id": "cYMgCa3Bit7I", "outputId": "0a13e5ac-84e9-49d7-b6d6-4dfdee5b6862" }, "execution_count": 47, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "company coverage 23200\n", "source text 22588\n", "text eikon 15082\n", "eikon company 14450\n", "united states 13418\n", "new york 10856\n", "public health 9662\n", "health care 7765\n", "said dr 7505\n", "said statement 6354\n", "mental health 6159\n", "disease control 5751\n", "centers disease 5444\n", "control prevention 5030\n", "food drug 4841\n", "new study 4757\n", "reuters health 4676\n", "drug administration 4648\n", "year ago 4597\n", "gdynia newsroom 4488\n", "health officials 4377\n", "world health 4028\n", "health organization 3744\n", "said email 3612\n", "health insurance 3487\n" ] }, { "output_type": "display_data", "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## Conclusions:\n", "\n", "- The above explanatory data analysis proof that we have enough data for both technology and health articles to implement topic modeling algorithm.\n", "\n", "- Even though 2020 has only covered 4 months data, it has the highest number of articles published related to health. The reason being COVID-19 pandamic.\n", "\n", "- The EDA shows that there are articles with few numbers of word counts, these needs to be cleaned as these articles might not have a complete sentence.\n", "\n", "- Reuters is the leading publisher when it come into health and technology articles" ], "metadata": { "id": "y3rvWHwgujZj" } }, { "cell_type": "markdown", "source": [ "# Data Lineage\n", "\n", "Our data was obtained from the two data sets on Kaggle.\n", "\n", "The first data set contains more than 2.5 million news articles and essays from 27 publications. The articles are from January 2016 to April 2020. This data set is located here:[ news articles data.](https://components.one/datasets/all-the-news-2-news-articles-dataset/\n", ")\n", "\n", "The second data set is a corpus for named entity recognition, also obtained from Kaggle. The data is located here :[NER corpus](https://www.kaggle.com/datasets/abhinavwalia95/entity-annotated-corpus)\n" ], "metadata": { "id": "N0ZYCjPHpjt_" } }, { "cell_type": "markdown", "source": [ "## Questions\n", "\n", "- We need some kind of guidance about to make a connection between NER application and Topic Modeling application. because the data given us is completely different.\n", "\n" ], "metadata": { "id": "Jg_kt-TOqZTO" } }, { "cell_type": "markdown", "source": [ "## Acknowledgements\n", "\n", "- The code get_top_n_bigram is adapted from [towardsdatascience](https://towardsdatascience.com/a-complete-exploratory-data-analysis-and-visualization-for-text-data-29fb1b96fb6a)" ], "metadata": { "id": "lw1wqV8C95xi" } } ] }