{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "collapsed_sections": [], "machine_shape": "hm" }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# **Understanding Named Entity Recognation Data**\n" ], "metadata": { "id": "nqrTKIyfYRRa" } }, { "cell_type": "markdown", "source": [ "# **Objective**\n", "\n", "The objective of this notebook is to be able to understand ner dataset more and extract meningful information. In order to achive this we follow Explanatory Data Analysis(EDA) procedure.\n", "\n", "The main section of this notebook organize as follows:\n", "\n", "- Load NER Data from kaggle.\n", "- Observation about the whole dataset.\n", "- Select the relevant columns.\n", "- Identify unique entity tagers in the dataset.\n", "- Data cleansing.\n", "- The distribution of top unigrams after removing stop words.\n", "- The distribution of top biagrams after removing stop words.\n", "- Conclusion\n" ], "metadata": { "id": "KJbZYeNbYfkt" } }, { "cell_type": "markdown", "source": [ "# Imports and Setup" ], "metadata": { "id": "GX1Gm0sVTU4O" } }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "qIFLx0_wimTB" }, "outputs": [], "source": [ "import pandas as pd\n", "pd.set_option('max_colwidth',150)\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from datetime import datetime as dt\n", "from string import punctuation\n", "import re\n", "import os\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "from IPython.core.interactiveshell import InteractiveShell\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "source": [ "# Download the Datasets" ], "metadata": { "id": "QqvaLRjVjIj3" } }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "T8aMPXC7t_VX" }, "outputs": [], "source": [ "pathdir = \"/content/data\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "x3CI3PtUp2lW" }, "outputs": [], "source": [ "def download_dataset():\n", " \n", " if not os.path.isfile('ner.csv'):\n", "\n", " # Downloading Annotated Corpus for Named Entity Recognition dataset\n", " !gdown https://drive.google.com/uc?id=13y8JNgL5TQ4x-yufpBOv3QBsEiE051sE\n", "\n", " if not os.path.exists(pathdir):\n", " # Make a data folder to store the data\n", " !mkdir data\n", "\n", " !mv /content/ner.csv ./data\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "zS6WbHz8wHzu" }, "outputs": [], "source": [ "download_dataset()" ] }, { "cell_type": "markdown", "source": [ "# Load Data" ], "metadata": { "id": "liJiX3Xf2hQh" } }, { "cell_type": "code", "source": [ "#specify the path to data location\n", "\n", "filepath = '/content/data/ner.csv'\n", "data = pd.read_csv(filepath, encoding = \"latin1\", on_bad_lines='skip')\n" ], "metadata": { "id": "LMwtt2rJnNhB" }, "execution_count": 5, "outputs": [] }, { "cell_type": "code", "source": [ "#Verify that the data is loaded correctly\n", "data.head().T" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 834 }, "id": "g4VoxOSnnOs9", "outputId": "1c39d739-e530-48c5-e995-301fa5859baf" }, "execution_count": 6, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " 0 1 2 3 \\\n", "Unnamed: 0 0 1 2 3 \n", "lemma thousand of demonstr have \n", "next-lemma of demonstr have march \n", "next-next-lemma demonstr have march through \n", "next-next-pos NNS VBP VBN IN \n", "next-next-shape lowercase lowercase lowercase lowercase \n", "next-next-word demonstrators have marched through \n", "next-pos IN NNS VBP VBN \n", "next-shape lowercase lowercase lowercase lowercase \n", "next-word of demonstrators have marched \n", "pos NNS IN NNS VBP \n", "prev-iob __START1__ O O O \n", "prev-lemma __start1__ thousand of demonstr \n", "prev-pos __START1__ NNS IN NNS \n", "prev-prev-iob __START2__ __START1__ O O \n", "prev-prev-lemma __start2__ __start1__ thousand of \n", "prev-prev-pos __START2__ __START1__ NNS IN \n", "prev-prev-shape wildcard wildcard capitalized lowercase \n", "prev-prev-word __START2__ __START1__ Thousands of \n", "prev-shape wildcard capitalized lowercase lowercase \n", "prev-word __START1__ Thousands of demonstrators \n", "sentence_idx 1.0 1.0 1.0 1.0 \n", "shape capitalized lowercase lowercase lowercase \n", "word Thousands of demonstrators have \n", "tag O O O O \n", "\n", " 4 \n", "Unnamed: 0 4 \n", "lemma march \n", "next-lemma through \n", "next-next-lemma london \n", "next-next-pos NNP \n", "next-next-shape capitalized \n", "next-next-word London \n", "next-pos IN \n", "next-shape lowercase \n", "next-word through \n", "pos VBN \n", "prev-iob O \n", "prev-lemma have \n", "prev-pos VBP \n", "prev-prev-iob O \n", "prev-prev-lemma demonstr \n", "prev-prev-pos NNS \n", "prev-prev-shape lowercase \n", "prev-prev-word demonstrators \n", "prev-shape lowercase \n", "prev-word have \n", "sentence_idx 1.0 \n", "shape lowercase \n", "word marched \n", "tag O " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
Unnamed: 001234
lemmathousandofdemonstrhavemarch
next-lemmaofdemonstrhavemarchthrough
next-next-lemmademonstrhavemarchthroughlondon
next-next-posNNSVBPVBNINNNP
next-next-shapelowercaselowercaselowercaselowercasecapitalized
next-next-worddemonstratorshavemarchedthroughLondon
next-posINNNSVBPVBNIN
next-shapelowercaselowercaselowercaselowercaselowercase
next-wordofdemonstratorshavemarchedthrough
posNNSINNNSVBPVBN
prev-iob__START1__OOOO
prev-lemma__start1__thousandofdemonstrhave
prev-pos__START1__NNSINNNSVBP
prev-prev-iob__START2____START1__OOO
prev-prev-lemma__start2____start1__thousandofdemonstr
prev-prev-pos__START2____START1__NNSINNNS
prev-prev-shapewildcardwildcardcapitalizedlowercaselowercase
prev-prev-word__START2____START1__Thousandsofdemonstrators
prev-shapewildcardcapitalizedlowercaselowercaselowercase
prev-word__START1__Thousandsofdemonstratorshave
sentence_idx1.01.01.01.01.0
shapecapitalizedlowercaselowercaselowercaselowercase
wordThousandsofdemonstratorshavemarched
tagOOOOO
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "code", "source": [ "#totally the data have 1050795 rows and 25 columns\n", "data.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iJZa9dP1vGeN", "outputId": "0f3773db-e348-4886-9393-cf550ac30d62" }, "execution_count": 7, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(1050795, 25)" ] }, "metadata": {}, "execution_count": 7 } ] }, { "cell_type": "code", "source": [ "data.info()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XwYxq7Wqx8QH", "outputId": "49f95da9-57cb-44b8-bff6-7b2c54388815" }, "execution_count": 8, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 1050795 entries, 0 to 1050794\n", "Data columns (total 25 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Unnamed: 0 1050795 non-null int64 \n", " 1 lemma 1050795 non-null object \n", " 2 next-lemma 1050795 non-null object \n", " 3 next-next-lemma 1050795 non-null object \n", " 4 next-next-pos 1050795 non-null object \n", " 5 next-next-shape 1050795 non-null object \n", " 6 next-next-word 1050795 non-null object \n", " 7 next-pos 1050795 non-null object \n", " 8 next-shape 1050794 non-null object \n", " 9 next-word 1050794 non-null object \n", " 10 pos 1050794 non-null object \n", " 11 prev-iob 1050794 non-null object \n", " 12 prev-lemma 1050794 non-null object \n", " 13 prev-pos 1050794 non-null object \n", " 14 prev-prev-iob 1050794 non-null object \n", " 15 prev-prev-lemma 1050794 non-null object \n", " 16 prev-prev-pos 1050794 non-null object \n", " 17 prev-prev-shape 1050794 non-null object \n", " 18 prev-prev-word 1050794 non-null object \n", " 19 prev-shape 1050794 non-null object \n", " 20 prev-word 1050794 non-null object \n", " 21 sentence_idx 1050794 non-null float64\n", " 22 shape 1050794 non-null object \n", " 23 word 1050794 non-null object \n", " 24 tag 1050794 non-null object \n", "dtypes: float64(1), int64(1), object(23)\n", "memory usage: 200.4+ MB\n" ] } ] }, { "cell_type": "code", "source": [ "data.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "o6obun2r48jC", "outputId": "f72d6bc6-ac49-4eff-e37b-996716cfcf73" }, "execution_count": 9, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Unnamed: 0 0\n", "lemma 0\n", "next-lemma 0\n", "next-next-lemma 0\n", "next-next-pos 0\n", "next-next-shape 0\n", "next-next-word 0\n", "next-pos 0\n", "next-shape 1\n", "next-word 1\n", "pos 1\n", "prev-iob 1\n", "prev-lemma 1\n", "prev-pos 1\n", "prev-prev-iob 1\n", "prev-prev-lemma 1\n", "prev-prev-pos 1\n", "prev-prev-shape 1\n", "prev-prev-word 1\n", "prev-shape 1\n", "prev-word 1\n", "sentence_idx 1\n", "shape 1\n", "word 1\n", "tag 1\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "markdown", "source": [ "# Observation about the whole data\n", "\n", "- The data has 25 columns and 1050794 rows\n", "- 17 columns of the data have null values. \n", "- data type of the columns int(1), float(1), and object(23)\n", "\n" ], "metadata": { "id": "EzYWiTEN5tnh" } }, { "cell_type": "markdown", "source": [ "# Select the data which contains only Sentence, Word and tag columns" ], "metadata": { "id": "B9QsrxPE0SPS" } }, { "cell_type": "code", "source": [ "ner_data = data[['sentence_idx', 'word', 'tag']]" ], "metadata": { "id": "dWK0fXlR0jek" }, "execution_count": 10, "outputs": [] }, { "cell_type": "code", "source": [ "ner_data.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jatQyuv654PV", "outputId": "7f497c40-a0fa-41b7-e8b8-a698ca828544" }, "execution_count": 11, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(1050795, 3)" ] }, "metadata": {}, "execution_count": 11 } ] }, { "cell_type": "code", "source": [ "ner_data.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "FerXPCTA59DG", "outputId": "0ba0a7e8-8eec-4d32-a439-0d4d475519ac" }, "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " sentence_idx word tag\n", "0 1.0 Thousands O\n", "1 1.0 of O\n", "2 1.0 demonstrators O\n", "3 1.0 have O\n", "4 1.0 marched O" ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sentence_idxwordtag
01.0ThousandsO
11.0ofO
21.0demonstratorsO
31.0haveO
41.0marchedO
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "code", "source": [ "ner_data.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "R1R7mjz91LgG", "outputId": "c46556ce-eed8-4f71-de8b-24726ec00480" }, "execution_count": 13, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "sentence_idx 1\n", "word 1\n", "tag 1\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 13 } ] }, { "cell_type": "code", "source": [ "#drop null value\n", "ner_data = ner_data.dropna()" ], "metadata": { "id": "2MQUCtH71R3Y" }, "execution_count": 14, "outputs": [] }, { "cell_type": "code", "source": [ "# the total number of unique sentence\n", "len(ner_data['sentence_idx'].unique())" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d9yp68G95lYQ", "outputId": "b918a22c-b93c-4f24-a563-2d691ca4a642" }, "execution_count": 15, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "35177" ] }, "metadata": {}, "execution_count": 15 } ] }, { "cell_type": "code", "source": [ "# the total number of unique word\n", "len(ner_data['word'].unique())" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_RDZ0EwW2Kzo", "outputId": "e8965f1a-7cc3-4355-87e5-afdf1966b0ef" }, "execution_count": 16, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "30172" ] }, "metadata": {}, "execution_count": 16 } ] }, { "cell_type": "code", "source": [ "# the total number of unique tag\n", "len(ner_data['tag'].unique())" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "CYF3NaEo2ZCl", "outputId": "d98ccd59-7eba-442f-93f5-b3ba18ff4441" }, "execution_count": 17, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "17" ] }, "metadata": {}, "execution_count": 17 } ] }, { "cell_type": "code", "source": [ "ner_data['tag'].value_counts(dropna=False)[1:]" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FsJsZvUdAqwf", "outputId": "ae3412f0-f4b8-4edf-a296-5928456f41f8" }, "execution_count": 18, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "B-geo 37525\n", "B-tim 20193\n", "B-org 20184\n", "I-per 17382\n", "B-per 17011\n", "I-org 16537\n", "B-gpe 16392\n", "I-geo 7409\n", "I-tim 6298\n", "B-art 434\n", "B-eve 348\n", "I-eve 297\n", "I-art 280\n", "I-gpe 229\n", "B-nat 226\n", "I-nat 76\n", "Name: tag, dtype: int64" ] }, "metadata": {}, "execution_count": 18 } ] }, { "cell_type": "markdown", "source": [ "## Meaning of BIO Taggers\n", "- The IOB format (short for inside, outside, beginning), also commonly referred to as the BIO format, is a common tagging format for tagging tokens in a chunking task in computational linguistics (ex. named-entity recognition).\n", "\n", " - B represent Beginning of an entity\n", " - I represent Inside an entity\n", " - O represent Outside entity\n", "\n", "## Essential info about entities in the datasets:\n", "\n", " geo = Geographical Entity\n", " org = Organization\n", " per = Person\n", " gpe = Geopolitical Entity\n", " tim = Time indicator\n", " art = Artifact\n", " eve = Event\n", " nat = Natural Phenomenon\n" ], "metadata": { "id": "Buh6FDMCMeLN" } }, { "cell_type": "markdown", "source": [ "## Observation about the data\n", "\n", "- The data has totally 35177 sentences\n", "- The data has totally 30172 unique words\n", "- The data has totally 17 unique tags. The tag names and their total count values are:\n", " O 889973\n", " B-geo 37525\n", " B-tim 20193\n", " B-org 20184\n", " I-per 17382\n", " B-per 17011\n", " I-org 16537\n", " B-gpe 16392\n", " I-geo 7409\n", " I-tim 6298\n", " B-art 434\n", " B-eve 348\n", " I-eve 297\n", " I-art 280\n", " I-gpe 229\n", " B-nat 226\n", " I-nat 76\n" ], "metadata": { "id": "cdjbymGQqHUs" } }, { "cell_type": "code", "source": [ "plt.figure(figsize=(12,6))\n", "publication_plot = sns.countplot(\n", " data=ner_data,\n", " x='tag',\n", " palette='Set1',\n", " order = ner_data['tag'].value_counts()[1:].index\n", ")\n", "\n", "plt.xticks(\n", " rotation=45, \n", " horizontalalignment='right',\n", " fontweight='light',\n", " fontsize='x-large' \n", ")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 453 }, "id": "NseIit5Cyiuz", "outputId": "aa05bafb-0cc9-4516-9932-b8aa3dfd5e40" }, "execution_count": 19, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]),\n", " )" ] }, "metadata": {}, "execution_count": 19 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAuAAAAGRCAYAAAAkSAbwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3debhdVXn48e9LQhhkSIAYAwkEIUoRBTUCilVEhUCl4FgGhZ8TVsE6VcGpIIjiVAoOWFQmq0UKVcCimCKOlSFRZEYioIQiREDBWqnA+v2x1ubunNx7c5Pcs/a54ft5nvPcc9be5553z+9ee+21I6WEJEmSpDrW6joASZIk6bHEBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqmhy1wHUttlmm6U5c+Z0HYYkSZLWYIsWLfptSmn6cMMecwn4nDlzWLhwYddhSJIkaQ0WEb8aaZhNUCRJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkiiZ3HUDXFs7buesQmLfwiq5DkCRJUiXWgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkV9S0Bj4h1I+KKiPh5RFwXER8q5WdExK0RcVV57VTKIyJOjojFEXF1RDyj9b8OjYiby+vQVvkzI+Ka8p2TIyL6NT2SJEnSeJjcx//9ILBHSukPEbE28KOI+FYZ9u6U0rk94+8NzC2vXYBTgF0iYhPgaGAekIBFEXFBSum+Ms4bgcuBi4D5wLeQJEmSBlTfasBT9ofyce3ySqN8ZT/grPK9y4CpETET2AtYkFK6tyTdC4D5ZdhGKaXLUkoJOAvYv1/TI0mSJI2HvrYBj4hJEXEVcDc5ib68DDq+NDM5MSLWKWVbALe3vr6klI1WvmSY8uHiOCwiFkbEwqVLl672dEmSJEmrqq8JeErp4ZTSTsAsYOeI2AF4L7Ad8CxgE+DIfsZQ4jg1pTQvpTRv+vTp/f45SZIkaURVekFJKf0OuBSYn1K6szQzeRA4Hdi5jHYHMLv1tVmlbLTyWcOUS5IkSQOrn72gTI+IqeX9esCLgRtL221KjyX7A9eWr1wAHFJ6Q9kV+H1K6U7gYmDPiJgWEdOAPYGLy7D7I2LX8r8OAc7v1/RIkiRJ46GfvaDMBM6MiEnkRP+clNI3I+K7ETEdCOAq4G/L+BcB+wCLgT8CrwVIKd0bEccBV5bxjk0p3VvevwU4A1iP3PuJPaBIkiRpoPUtAU8pXQ08fZjyPUYYPwGHjzDsNOC0YcoXAjusXqSSJElSPT4JU5IkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqqG8JeESsGxFXRMTPI+K6iPhQKd86Ii6PiMUR8bWImFLK1ymfF5fhc1r/672l/KaI2KtVPr+ULY6Io/o1LZIkSdJ46WcN+IPAHimlHYGdgPkRsSvwMeDElNK2wH3A68v4rwfuK+UnlvGIiO2BA4CnAPOBz0XEpIiYBHwW2BvYHjiwjCtJkiQNrL4l4Cn7Q/m4dnklYA/g3FJ+JrB/eb9f+UwZ/sKIiFJ+dkrpwZTSrcBiYOfyWpxSuiWl9H/A2WVcSZIkaWD1tQ14qam+CrgbWAD8EvhdSumhMsoSYIvyfgvgdoAy/PfApu3ynu+MVC5JkiQNrL4m4Cmlh1NKOwGzyDXW2/Xz90YSEYdFxMKIWLh06dIuQpAkSZKASr2gpJR+B1wKPBuYGhGTy6BZwB3l/R3AbIAyfGPgnnZ5z3dGKh/u909NKc1LKc2bPn36uEyTJEmStCr62QvK9IiYWt6vB7wYuIGciL+ijHYocH55f0H5TBn+3ZRSKuUHlF5StgbmAlcAVwJzS68qU8g3al7Qr+mRJEmSxsPkFY+yymYCZ5beStYCzkkpfTMirgfOjogPAz8DvlTG/xLw5YhYDNxLTqhJKV0XEecA1wMPAYenlB4GiIgjgIuBScBpKaXr+jg9kiRJ0mrrWwKeUroaePow5beQ24P3lv8JeOUI/+t44Phhyi8CLlrtYCVJkqRKfBKmJEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklSRCbgkSZJUkQm4JEmSVJEJuCRJklRR3xLwiJgdEZdGxPURcV1EvK2UHxMRd0TEVeW1T+s7742IxRFxU0Ts1SqfX8oWR8RRrfKtI+LyUv61iJjSr+mRJEmSxkM/a8AfAt6VUtoe2BU4PCK2L8NOTCntVF4XAZRhBwBPAeYDn4uISRExCfgssDewPXBg6/98rPyvbYH7gNf3cXokSZKk1da3BDyldGdK6afl/QPADcAWo3xlP+DslNKDKaVbgcXAzuW1OKV0S0rp/4Czgf0iIoA9gHPL988E9u/P1EiSJEnjo0ob8IiYAzwduLwUHRERV0fEaRExrZRtAdze+tqSUjZS+abA71JKD/WUS5IkSQOr7wl4RGwAnAe8PaV0P3AKsA2wE3An8KkKMRwWEQsjYuHSpUv7/XOSJEnSiPqagEfE2uTk+ysppX8HSCndlVJ6OKX0CPAFchMTgDuA2a2vzyplI5XfA0yNiMk95ctJKZ2aUpqXUpo3ffr08Zk4SZIkaRX0sxeUAL4E3JBS+sdW+czWaC8Fri3vLwAOiIh1ImJrYC5wBXAlMLf0eDKFfKPmBSmlBFwKvKJ8/1Dg/H5NjyRJkjQeJq94lFW2G/Aa4JqIuKqUvY/ci8lOQAJuA94EkFK6LiLOAa4n96ByeErpYYCIOAK4GJgEnJZSuq78vyOBsyPiw8DPyAm/JEmSNLD6loCnlH4ExDCDLhrlO8cDxw9TftFw30sp3cJQExZJkiRp4PkkTEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqaIxJeARcclYyiRJkiSNbvJoAyNiXWB9YLOImAZEGbQRsEWfY5MkSZLWOKMm4MCbgLcDmwOLGErA7wc+08e4JEmSpDXSqAl4Sukk4KSIeGtK6dOVYpIkSZLWWCuqAQcgpfTpiHgOMKf9nZTSWX2KS5IkSVojjSkBj4gvA9sAVwEPl+IEmIBLkiRJK2Gs3RDOA3ZLKb0lpfTW8vq70b4QEbMj4tKIuD4irouIt5XyTSJiQUTcXP5OK+URESdHxOKIuDointH6X4eW8W+OiENb5c+MiGvKd06OiFg+EkmSJGlwjDUBvxZ4wkr+74eAd6WUtgd2BQ6PiO2Bo4BLUkpzgUvKZ4C9gbnldRhwCuSEHTga2AXYGTi6SdrLOG9sfW/+SsYoSZIkVTWmJijAZsD1EXEF8GBTmFL665G+kFK6E7izvH8gIm4gd124H7B7Ge1M4HvAkaX8rJRSAi6LiKkRMbOMuyCldC9ARCwA5kfE94CNUkqXlfKzgP2Bb41xmiaU+R/8Wtch8O3j/maF4xz05QMqRDK6r77m7K5DkCRJGtFYE/BjVudHImIO8HTgcmBGSc4BfgPMKO+3AG5vfW1JKRutfMkw5ZIkSdLAGmsvKN9f1R+IiA2A84C3p5TubzfTTimliEir+r9XIobDyM1a2HLLLfv9c5IkSdKIxvoo+gci4v7y+lNEPBwR94/he2uTk++vpJT+vRTfVZqWUP7eXcrvAGa3vj6rlI1WPmuY8uWklE5NKc1LKc2bPn36isKWJEmS+mZMCXhKacOU0kYppY2A9YCXA58b7TulR5IvATeklP6xNegCoOnJ5FDg/Fb5IaU3lF2B35emKhcDe0bEtHLz5Z7AxWXY/RGxa/mtQ1r/S5IkSRpIY+0F5VEp+waw1wpG3Q14DbBHRFxVXvsAJwAvjoibgReVzwAXAbcAi4EvAG8pv3cvcBxwZXkd29yQWcb5YvnOL1lDb8CUJEnSmmOsD+J5WevjWuR+wf802ndSSj8CRuqX+4XDjJ+Aw0f4X6cBpw1TvhDYYbQ4JEmSpEEy1l5Q9m29fwi4jdxtoCRJkqSVMNZeUF7b70AkSZKkx4Kx9oIyKyK+HhF3l9d5ETFrxd+UJEmS1DbWmzBPJ/dSsnl5XVjKJEmSJK2EsSbg01NKp6eUHiqvMwA71JYkSZJW0lgT8Hsi4tURMam8Xg3c08/AJEmSpDXRWBPw1wGvAn4D3Am8Avh/fYpJkiRJWmONtRvCY4FDU0r3AUTEJsAnyYm5JEmSpDEaaw3405rkGx59OuXT+xOSJEmStOYaawK+VkRMaz6UGvCx1p5LkiRJKsaaRH8K+ElE/Fv5/Erg+P6EJEmSJK25xvokzLMiYiGwRyl6WUrp+v6FJUmSJK2ZxtyMpCTcJt2SJEnSahhrG3BJkiRJ48AEXJIkSarIBFySJEmqyK4E9Zhz0kH/3HUIALztq2/qOgRJktQBa8AlSZKkikzAJUmSpIpsgiINqNuO2brrEACYc8ytXYcgSdIaxRpwSZIkqSITcEmSJKkiE3BJkiSpItuAS1otd965b9chADBz5oVdhyBJ0phYAy5JkiRVZA24pMeEUw5+dtch8Oav/KTrECRJA8AacEmSJKkiE3BJkiSpIhNwSZIkqSITcEmSJKkiE3BJkiSpIhNwSZIkqaK+JeARcVpE3B0R17bKjomIOyLiqvLapzXsvRGxOCJuioi9WuXzS9niiDiqVb51RFxeyr8WEVP6NS2SJEnSeOlnDfgZwPxhyk9MKe1UXhcBRMT2wAHAU8p3PhcRkyJiEvBZYG9ge+DAMi7Ax8r/2ha4D3h9H6dFkiRJGhd9S8BTSj8A7h3j6PsBZ6eUHkwp3QosBnYur8UppVtSSv8HnA3sFxEB7AGcW75/JrD/uE6AJEmS1AddtAE/IiKuLk1UppWyLYDbW+MsKWUjlW8K/C6l9FBPuSRJkjTQaifgpwDbADsBdwKfqvGjEXFYRCyMiIVLly6t8ZOSJEnSsKom4Cmlu1JKD6eUHgG+QG5iAnAHMLs16qxSNlL5PcDUiJjcUz7S756aUpqXUpo3ffr08ZkYSZIkaRVUTcAjYmbr40uBpoeUC4ADImKdiNgamAtcAVwJzC09nkwh36h5QUopAZcCryjfPxQ4v8Y0SJIkSatj8opHWTUR8a/A7sBmEbEEOBrYPSJ2AhJwG/AmgJTSdRFxDnA98BBweErp4fJ/jgAuBiYBp6WUris/cSRwdkR8GPgZ8KV+TYskSZI0XvqWgKeUDhymeMQkOaV0PHD8MOUXARcNU34LQ01YJEmSpAnBJ2FKkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkV9S0Bj4jTIuLuiLi2VbZJRCyIiJvL32mlPCLi5IhYHBFXR8QzWt85tIx/c0Qc2ip/ZkRcU75zckREv6ZFkiRJGi/9rAE/A5jfU3YUcElKaS5wSfkMsDcwt7wOA06BnLADRwO7ADsDRzdJexnnja3v9f6WJEmSNHD6loCnlH4A3NtTvB9wZnl/JrB/q/yslF0GTI2ImcBewIKU0r0ppfuABcD8MmyjlNJlKaUEnNX6X5IkSdLAqt0GfEZK6c7y/jfAjPJ+C+D21nhLStlo5UuGKR9WRBwWEQsjYuHSpUtXbwokSZKk1dDZTZil5jpV+q1TU0rzUkrzpk+fXuMnJUmSpGHVTsDvKs1HKH/vLuV3ALNb480qZaOVzxqmXJIkSRpotRPwC4CmJ5NDgfNb5YeU3lB2BX5fmqpcDOwZEdPKzZd7AheXYfdHxK6l95NDWv9LkiRJGliT+/WPI+Jfgd2BzSJiCbk3kxOAcyLi9cCvgFeV0S8C9gEWA38EXguQUro3Io4DrizjHZtSam7sfAu5p5X1gG+VlyRJkjTQ+paAp5QOHGHQC4cZNwGHj/B/TgNOG6Z8IbDD6sQoSZIk1eaTMCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKTMAlSZKkikzAJUmSpIpMwCVJkqSKJncdgCRpyF0nXdp1CMx42wu6DkGS1mjWgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFZmAS5IkSRWZgEuSJEkVmYBLkiRJFfkgHknSSjv11FO7DoHDDjus6xAkaZVYAy5JkiRVZAIuSZIkVWQCLkmSJFVkAi5JkiRVZAIuSZIkVWQCLkmSJFXUSQIeEbdFxDURcVVELCxlm0TEgoi4ufydVsojIk6OiMURcXVEPKP1fw4t498cEYd2MS2SJEnSyuiyBvwFKaWdUkrzyuejgEtSSnOBS8pngL2BueV1GHAK5IQdOBrYBdgZOLpJ2iVJkqRBNUhNUPYDzizvzwT2b5WflbLLgKkRMRPYC1iQUro3pXQfsACYXztoSZIkaWV0lYAn4DsRsSgimkeZzUgp3Vne/waYUd5vAdze+u6SUjZSuSRJkjSwunoU/XNTSndExOOBBRFxY3tgSilFRBqvHytJ/mEAW2655Xj9W0mSJGmldVIDnlK6o/y9G/g6uQ33XaVpCeXv3WX0O4DZra/PKmUjlQ/3e6emlOallOZNnz59PCdFkiRJWinVE/CIeFxEbNi8B/YErgUuAJqeTA4Fzi/vLwAOKb2h7Ar8vjRVuRjYMyKmlZsv9yxlkiRJ0sDqognKDODrEdH8/ldTSt+OiCuBcyLi9cCvgFeV8S8C9gEWA38EXguQUro3Io4DrizjHZtSurfeZEiSJEkrr3oCnlK6BdhxmPJ7gBcOU56Aw0f4X6cBp413jJIkSVK/DFI3hJIkSdIazwRckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqsgEXJIkSarIBFySJEmqyARckiRJqmjCJ+ARMT8iboqIxRFxVNfxSJIkSaOZ3HUAqyMiJgGfBV4MLAGujIgLUkrXdxuZJGkQLJy3c9chMG/hFaMOn//Br1WKZHTfPu5vug5BesyY0Ak4sDOwOKV0C0BEnA3sB5iAS5I0jg768gFdhwDAV19z9qjDTzronytFMrq3ffVNow6/7ZitK0UysjnH3LrCce68c98KkYxu5swLuw5h3E30JihbALe3Pi8pZZIkSdJAipRS1zGssoh4BTA/pfSG8vk1wC4ppSN6xjsMOKx8fDJw0ziHshnw23H+n+NtIsQIxjnejHN8Gef4mQgxgnGON+McX8Y5fvoR41YppenDDZjoTVDuAGa3Ps8qZctIKZ0KnNqvICJiYUppXr/+/3iYCDGCcY434xxfxjl+JkKMYJzjzTjHl3GOn9oxTvQmKFcCcyNi64iYAhwAXNBxTJIkSdKIJnQNeErpoYg4ArgYmAScllK6ruOwJEmSpBFN6AQcIKV0EXBRx2H0rXnLOJoIMYJxjjfjHF/GOX4mQoxgnOPNOMeXcY6fqjFO6JswJUmSpIlmorcBlyRJkiYUE3BJkiSpIhNwSZIkqSITcEmSJKkiE3CpiIgJsT30xjlR4pYGSURE673b0GNAREz4nt+0ciJi7fL3cV3H0sudzhpiUA8ggxrXcFJKjwBExIubjXbQRES04nx5+/MgmQgnCU0C1k7EBt1EihUGM96I2DAi1k4ppYjYOyJ2GsRtSOMnIjYpy/yhiHhhRDyj65iGM4jby3AiYvOI2Lm8PzAijuk4pOVExJyI2C6l9OeIeDlwbERs0HVcbQN3UOzSaEnCIG0YrcRhu4h4QURsOagHkFayuGdEPKXreIbTXu4R8VHgQmD2IC1zeDT5TuX9B4HPAk/tNqrl9ZwkHBwRG6eUHhmkJDwi1kpDfbBOHuATrmXmWWv5D9y6Wf4+LSL2iYinRsSUkuQO0nKfCVwLPC8iDgD+A3hit1EtrzU/t4yIJ0bEDl3HNJxWnLMjYseI2CYippWygVjuZZmfC7y5LPMFwBO6jWp5PfskImL9LuMZSalJvhB4X0S8F/gKcHu3US2rxHgicHFEvB34N+BnKaU/dBvZsuwHvCgrf5M0vAr4C+BB4NqU0jdLeaQBmWER8VLgDOB3wEzg3cBXU0pLu4yr0TM/dwYuBU4DTkopLe40uBFExBOBw4ELUkrf7zqekUTEU4F/AD4zaHH2LPenARcA1wOvSin9oT18QGI8HHgBMA24BXhnSumBLuNr9MR5ELA9sC7w7ZTSf3Ya3DAi4mXAP5eP9wDfBt6fUvqfQVju8GjCeBrwMmAD4A0ppdO7jWpZzXEmIvYDPlKKtwTOA05NKf1Xd9ENacX5UnKcU4A/AncC70kpXdVpgEVEbAJ8BtgFmA28KaV0ekRMSik93G10Wc+2/g5yrPOALwPfTyl9r8PwllOO6f8ObA4cl1I6upQPUo70HPKDdbYDPpBSOiEiJqeUHuo4tEcNxBnqIGit/J8g1yzuCrwB+GxEfKGMk7queYpsBvB+4D3AC8k7v08Ah5dhneqpAf17YA/gz+T5+a6I2LbL+IYTEQcClwMvAe7oOJwRlYTxM+Qd302lbCBqQ3uW+9uAdwAPA/OBcyJio0GoCW/F+DHydrQIOB14PXB2RGzcYXiPasX5SeBTwLOB5wDfiYiPRMRmXcbXKPukzYC3A+8i7zvPA54HfD4iNijLvet9Z1PDeD6wIfB/wH/HgLULLseZvYCvAp8GngW8ETgEmNNhaMsocb4IOAv4HPAk4BTgRcDuHYb2qJJw3UvevjcF7gI2KOUPd70varS29RPIlWkLycf1twNHRcTmHYa3jDLPbgMCWAo8qSS7zTrR6Txt7Wd+Xf7eChwUEduXJkiTOgpteSklX+UFvIp89r5r+bwJ8BZgCbnmtsvYmqsVa5Frbj4KPK41/Ehyjf0xwOO7npclpmOAe4F9gb2ADwJ/IO+st+k6vp5YX0a+NPm/wNNL2aSu4xomzpcAvwH+BPxV7/oxCC/gaOA+4KXkhPGjwA3Ad4ANyzhrdRzjXwI3A7uVz/OBB4DDesbrdL4CLycf5J7VzDPgdcBDwJFdxUg++Db7pEnkmvmzm30PsDbw98DPyLV4jxuQ+bkP8DXg1eRk8YGy7U8eZtzO1lHylYSPlffbAL8g1343w9fueD6uVV5fAv6xlM0kJ2afaY23cZdxlhgOAa4py/5L5BPudzf79673Ra04nwf8kqH8Yx75JPGQQYqzFe8s4Lkl5nOB53QdU098GwJbAc8HLgGuA55ShjXLflqnMXY9kwbpBXwI+H57RSdfmv4gcBWwZUdxNQe6fcrB7DLgCmCrnvHeQ05wPwFM7zDOADYDrgbe2jPOESV5+BzwpI7m57A7MuDF5JqHm4HtRhu34zj3IJ/dXwjs3Dv/O17u08tyf1Nr+Nrkqx9LSswbDMC8PQi4orx/KTkRe1P5vBFwUFex9cR5JPDdZn615vU7yrb+5I7j24fcjvrbwGU9w6aQk/ArgG/QqjCoHGMzz55GrgE9qDXsjLLs96Mk4WUftWOH83R9csL4auBxZbv5fGs6Xgv8dZfLvRXrfwL7kyur7iCfODRx7lu2+ykdLvOtyScF7yqfNyXX2C8iX61pTmr/H6XipcN5uR/wvfL+VWW9fHP5vEHZ1jo5oWnNz6nArPK+SWLnk5Pwcxiq0DgO+GBHMT4emAE8oTVsb3ISfg2wfSl7D/DxLtbPR+PqcoUblFdrRTqanIA1tXTNAp0HPAI8r8MY9yDXep5XdnqPAO+j5wyOXOt8F7BZ5fjaJy0zyMnCb4GDS9mU1vAzyTXj/0jlk5qeOF9ETr4OAdZpzedLySdcTRJevSa8J87nkWtCX8lQ8jof+FXZ6T2rw/WyHecsYOOyk3v3MKaf6EkAAB1lSURBVMPPKevtf/RuY7XjLQe4C8lJzqPJdxm2C7m2fqeu5msrlneX7Xnj8rlJEnco5bt3GNtzyU2MTiVXCvyRXMPY3tankO9X+D6weYex7khOBk9sz8fy/nTyycx7yU0+HgZ26CDGZ7S271OAfyXf3HYKQ8eoSeQE8p/oKHEgn8g08Xwb+C75/onPUmrmgXXKfP14s2/tIM5nkivVTiEfj5rYppGPQVeSTxhOKPulTiqEWvG+mny/zMHA7ynJdxm2B/kk9ikdxNU+ofoh+YTwO2V7Wq8M2wu4kdyM81vkXGXnjmL8QVkfvw/8XWucvUvcvyMfix7ueh/f2Q93OtEj1yzuWTbEt/SUb0eu1esk0SHXKn6oHRf5sv6DwDtZPgnftHJ80Xr/afLdxpATrR8xlGxNKX9PLBvyfcARoy2TPsb5UXLbsJ+QE5mFwJ5l2D7ks+VFdHMgbsd5Arn5xvUl1rubA0VZX28jH6h36zjOz5QDxFZl2f4HQyc1zc7xA8DXyzrxyUrLfKRt/VnA/5Ttvb2TXh/4Jvmu+Wo19KPE2TSV+SCtk2qGmiU8v4sYye19/w54e2u+fQj4KfnqVjvBXRvYpPb62ax75GRwaVnW3xhhek4iX6K+nA4OyuR7On7e2h++lZyE/RfLNus5npyUd3X1cBNycvW+8vkF5NrPm9vzldx++dcdxrlp2QfdC1zYXhfL36nk+youJV+hqXbFY5RtfRvyScH/NfO3lK9Driw4r+Y+qSe2l5ArKo4mV1B8k5xwfxBYv4zzPPIJ1xcotcyVY9yXXAnw9+QTghPLNn9ka5xnlf3UGV3EuFzMXQfQwUJq73T3IZ/FvQOYUco+UDaA95JvJtqOfJb/kxorP3AAMLu8D3I3c/8LLAZe0zPux0qsb2sf4KhYq8iySdguwPcoVwrINbaLyM1mmuR7Evnu6Z3JNeC/ofKlafKNLXcDzyyfDy4b6vzWOHuRT7rO7HBdfTs5cWjaBB5e4ty/Nc588onYcZVjay/3meQTmL8sn3cqO+tTyQe6SeTk4evAm8g1ZT8HNqoY48HkS47vp9x/APxNmZ+fLuvqvuSrS1czVNNc9SShxPD6sk/aqpR9jHwCdhK5lvTpwEUV90lvBJ7Y+rwdOVFYAry2Vb4h+eD2c+BkOm6n3DMNW5BvWr6TnCg8ek9Na5zN+71OjhLfFPI9KP/RKvsUuZLg++Qa2/PL/qDrphInk0+2NybXJr+n7E9/Qq6dP29A4twLuJh8deMlrfJm216b3MSnWrOOnvXtYHKe8R7gqaXszeQTmm+QK1gOJucf1zJ08lA1CSd307kIeFv5vAH55OpGciXA+xmqCV+bbq4Yzy0x/m35PJ18hfjKso9/X8/4y93z0ck62nUAnU14Pqj9glzD8ANyEvNC8tnmO8iXKX5bVrIf9XvlJ9ca7ADcT0nAW8NOLSvRxymXKFvDPlKGvaX2htkTxwHk/kD/haHL/FPISeTPy8Zwdnl/U5neN5YdS9VLqeRuyN7Wivt3LNvWrqm53bWjnUlTa/c1ylUPcvvA+4E3ls8btXZ6u3QRZ/nt95J7azi9xLxWK94/kGtEF5Qd4S/KsAPLOtC3KzUsm3x/iny1ZVH53T+Sb2QM8gHuenK3eT8mX5pstvWqO+myT/o1uebuF2Xf8+oy7ITWweSann1S35Z9WbcuAbZulc0gn0QtBf61Z/wNyE1Ofg18sqN1ctgKCHITqTvLcn5Gq7yTg3FrW2mW49PJNbaHtMZ5LfnE65vk5oWdNpMoMe1LvnrUnHBPJV+p+Qr5atyxwNwBWeZ7kJPwnwB7tco72a+33n+cXElxaVnm15O77KTsm/6TXLn24zJPO9knld98Irkp3DRyZcvN5KtcU8p8vbXsu9btcJ3crMS0Oflk+0byfRNPIFf8PAJ8qKv4Roy76wA6WlhvJp+xzyufX8nyNYvbkxOw3Vo7yr6v/JQaGHLN9zat8s+TTxJex/JJ+DHAX3Q4P9clJw0PAJf3DJtMru3+BLkW5+Otncmp5Jq89SvGuj65fdgh5DasDzB01rwWOaHs7Qmji531ZPLJ4T7AXw0T51vJl9omdxUnufboRPIl6Z+0ypvaxTlleX+efKLY1Dyd3s/lzrIHuqdQ2nOXZT+FXOP9J+DlZZwnkJvObNaKvXby/Sby1aCmB55mn/Sy1jgbl3X2qZX3SZuUv8+k3PRZ5tUnyAe6j/aMvyFwFK1a84rzsVl+u5FP/k8kt1lupmF2mc8/pvsa2hk9n6eTa4//ufb6t4I4p9GT4JJPVL8PTB2A+Jpl/hxyk4gPk5870Azfq+xv/ovSzLDjeGeRmzo9q3xel9ym/6eUJl2l/InAel3tk1pxrE05ASefeJ/N0D0pJ5Nvvr2YyvedDRNn09T1U+QmhFPL54+QryrcXbaxwekxrOsAOlhIQU4ImpvEXk6uWTysfJ7KMIkBfa5dpvRwUP5uVg6+X2XZS79fJCcOb6CjHgWaeThM2SbkxGop+ebQEXcWZR7/I/nMv283lYy0zMoG+iNyDe0bWuUbkS/31W7OsVycZaf8XXKt530se0POdPIJz/tqxTjKcp9DPgF8hNLTQClfrvkBOfn5JLm2+akV4n1NmYcLyCcLk1rDTie3pV3u8nMXO2hyLedx5f0BtG7ConSnNZb1Zpxjap/cPYHczOi7DN2D8PiyHf+c5ZPwTrpGLH9fVvYt3ykxLyHX4DW9N8wuy/46OurthHxCuIRcCdHuYvJAci9Rz+hqPvbEuSP5itGJwN6t8v3ItbZN87j2jbfVm0CWZX4fubbzB2WdPLE13nzyQ8FuAF7Y4fx8b9l3f6Ns1038zY2hCxlqrtmuSKgyT1vxPL7sMzfqGX4huQKjGe8k8s2jM2rE1xPjpvT09kbOny4BvtAqOwn4295pGYRX5wFUWFjNjq3dj/YCcnvafWjVLJbhbyffoNf32kR6LkGW90377/3JNd5fZNma8C+STxiOoIMknOXbTG7CUPv5TciXIReRbw4dbvrmkG8kWkQfb3bqiXMrWpdEyXdD30Nur97U6M0uO8YrqFjT0BPntuTL+zPL52eSaxcuJ1/aX4+cCDXtf7uKczqt3mvINbNNTwJHtMontba7meQu9a6lT0lPa31bi9wc5tPky6O/bI2zbvm7C7lGpNqd+sPNy/J5CvnA+07y1aL21Y4gNy97JxV6k+iNrZTNLX/fQE5uLmxtN00Svgg4ufa8HCbWvyTXcL+ufF6fnND+ilw7unkp34rcHGlO5fia7WEH4DDy5fyfkWsQdyQnYmfR6ju9o/n4aK0r+ca7C8mJ+Fcp7anLunD2ACzz55b9ZFOJ9jTyCdjvgS+1xvtrcs19tWXes98M8v1a/0O+eb45bjY9ymxPx72tlTj2K+vk9eTmL01LgfXIPZz8pEzHyeSTnurdM5cYryM31VtAziuadfZE8n087y4x3s2APXfk0enoOoCKC6yp/Vib3FXWtWXlafcsMrXsaE6oGNccSo8Q5LP4/6bUepPb2j3C8kn42eS2jFX7BGXZM/J/ICeGvyore9Pd4LSyk15IPplZ7kQGeDKVHhZETvZvKzvkG8jJzCRyTdPNZdg1Jd7LqNCudoQ4P0K+0XYJ+bL+oeSD3yvJB76ryzr7Q3KteLU4h1nuPy07tWvIScTG5NqSE8hdOx0+3P8gn1z0fbk3B4SyPR9Tln1v93g7kQ/a1RPwVgyPbx003lMOJn9i2RsbNyBflanWnpp8Q9Np5f1LyScx25bPryMfgHuT8FPLuln9+QOtuKeQ7985vnx+Ytm+P0NuH/og+YaxrcrwmiewzXLeoOx/mvs3NiVfqbmE3BvTueTmMQuBLTqYh02c69Nq01u28ReRrxzeQL4S8hnyyWKX3fOuRT6x/3z5PIfcxPBM8g3B/8uyNeFd9UXf9GSzDrld/4PtuMqwHUvs8zqcn88kV/AdWfbnF5f9UtOxwizy/v/n5K56u+gxaB75JOZoco5xFflkumnSswu5InAxuUKt8+5kR5yWrgPo40Jqn3nuTb6hcpfyeWvyTUI3kc86Nyxl3yInN01b1b5f9iG3411MrtV8kKGbrpqz4nYS3m6OMrPDeXssuQb5YHJ7+s+RE6/3lOGbkW/G/DXwNx0u99eU5X4g+WlYZ5CT2+YA/Uzy5bP3kRONZp7XaFfbezf83eQTsEPIZ/APA39fhj+R3Kb2feSeO6rF2RPzMWV+vpZ89eirZRs6lnzA3pTc/vIRSvvqDpZ5s60/u3zeiHwSdh25XeB25Ee6f7PsnKudaA2zzP+boZ54diMf2H7K0MMsti37hUW1ljX5ROnlZRl+v/w9qGec1zOUhDfNUaZT8TL0KPHvTG4jvwH5CtcXSvna5AR3KblmbDL1L+vvXdbBK8py3bdnvEPIl8sfKa/ZNeIbIc6Lyrr4Q3Jt47QybCq5rfW55CZ891Iqtzpc5jPIN7CuR66Vb04eZ5FPsh9hKEGvtcx7t/U7GWpWtDa5ScSfyc1hX0zu3eg/yrbe1Q3125AT7w+0ynYjN5e5BXhBKduQfIyv/lAg8pWj3YCjWmVTy/7ol6396XqlvPN7FEadnq4D6NNCaq/8L2nt1K5vHdx2Jl+q/AVDd8b/hA5qQMntUR8h14I0N2FOZuiS+r7k5Pwchm6G6Oqph1uVHfMrWmVBvkT+CEN9aW9GTti62pn8NbkW+XU95ceTTwz2GeF7tWu+dyc/KKK37/m3lfn5VyN8r3acW5BrvA/uKW967nhR+TybfENhrYRxtG39OaV8Y/KT2R4gX/U6m1x719RCVmtuVt7/NfkA/Aj5xrAdS/lLyUnvfeRa56uo1NvJMPE2fehe1ipbp/X+9eRk5weU2vFBepFPrn/OUBvlueRehU7uIl5yEvtHcg38a8o6+AitG2pb4+5AR5fMy7r5h7K9PJfcS8dd5KSn90bM3ek4+S5xNCcOu5KvEjZd+s0h39T6d7R68akQz2jb+k6lfHLZTzbPIvinsn4+2l1v5Xm4Lfm4fjut5LYM243cBeZNzX6+o+U8s+wbH2bofpmmsrRJwm8q60FnPcKt1DR1HUCfF9inygL5MPkS9C3khPu5Zfh0co3iW8hnoVVrFslnwmuRk4GvMPQQi6a5TDsJfxm51rmzmu8Sx3bky3rtflXXIt80+B1yMjml5zu1dyY7kNt9PsJQLXI7efgh8M0BWD+fT75E/luGuhdsbsSdTL5p6F/KetJprwhlB30P5cEvPfPzMuCcYb5T8xL/SNt601XaRuSk4qfkWqfmQFe16yxyrzC3k69ofJ7cDOoXwNPK8G3IvTb8Lblb1OpXO8o6+O6yL7oP+LfWsHazhDeSKw2q1tSOEHNzIG5fOfwfcp/fGzLUjrn6UxnJTXR+zFDXp5uX7f7zw4zbZVeym5CTxOZK5jRyreJnu46RcrLcU9Ys62Zbfjb5qtLryj70OPJV7a4eAjXctn4jQ0n4FPLVxP8BPtz6Xif955OvZP6G3LxoZs+w55BPxn5Gq2eWyvE9rizbm4Hv9M4vchJ+A/kqQmddIq7UNHUdQB8X1gvJZ+7PbpXtTW7TdBNDl6h7z+q76HKu2ZF8gNzW9xRa7f8oT2Okw5tyWrFsSm77fWxvPOQH7PzLAMS4Afly7i+BBa3yZkM9kdbT0TqO9f3kxPYHDLVdbmp0zgDO7zrGJiZyUvu5VlnTX/rpXS73Ubb1b5dtvakFbW4WvYp842DVZKwcxJbS6oWBoYdc3Ehulz5cTzNd7JPWYuheiQdoJeFleHOpd8PKcU1h+W5Ym+R7Ttm2m8e5N893uJbcVOIZNWNtxTe77Iu2Ku+XAP/cGv7/GICrCOQaxl+QK6aeQG6+0Y7zZZSbWCvHNYvcPGPPVlmTdG9Z9j+zyVfpvkFOen9R9quddDW5gm39epZNwpvmKFV732rF1e4d6r1lX3RS77Imt63uurnRhuRmow/QugGYoWP7xlS+sXq1pqfrAPq4oF5SDspze8pfSj7jvLF1YO6sqyd6znbJCdlV5DPmueSb3n5FPrsbiP4ryTX2S8rBuelVYj3y5fOPVowj2n97ytob6rnkhGIKOam4jA6fcFnia3fxdhQ5Ufw88IRSti65+cEXKsa03PwsnyeV+ff3Jc73t79DrjXr5KErJYaxbOvt5ijHlm2q2s3W5bf3I1/tmN4zv3ckJ4g/ZKg5St+7PS1/R6xZbH1uHlZ1LrlW9DhyglPlRupWHNuR7zu4nHxyOrc1D+eQaz9P79m2jiDXmnWW4Ja4f0Y+KbyVfMNqU+kyg9zjyWu63r+Ta8CvIj9j4NayP2oSm83ISfAhHcT1xLLMv0dph9wqv4Pc21GzPm/D0FXtznq+GMO2/iOGrnqtR76a9Ait9tcVY53c8/kfyM0NT6aDK+4j7JN6Y3w1pTlhq2xgnrw75mntOoA+LsT55INyk2S3z/J+UnaIC4Ht+hxH+6mQo9XcfJqhNqlHkmvCf00+m++sp4ZWrL1J2bkltvPJtYk/JNc01bxUvlyPC635vX75+2py8nAN+RL0V8iXqZoDS1dt6XvnZ3Oi9Uty10/nlJirxTnC/GwOHOuQk9cTy3JfQD4R+yG5RqezJjIrsa0/pZRtRL7aVPVBMcCTyEniq3vKNyHXjN1D68mw/V7mrLhm8Qxyc4n1yDdmLiUnZndSuacGcuJyD7lJ1ofKNn16a3kuJD/htllfO7n3ZJT4v0tOsM7oKT+BfINw9a7ceuIIcqXFOeQnMJ7fM/z4sp1v1VF825KbOP6IoZsB7yKfcA1bcdDx/BzLtn5da3tbj9wcpW8P1GPlcpCjyc31TqNUClWabyvaJ51Z9knrk4/tS4GLul7eqzy9XQfQ54X5X+QbcTZvlc0gJ49vKQe7v+3j7zfJ4KrU3OxC7mliq8rzbGrvBteajleybL+q7yoHxIvLxrtMO8w+x/k8co81zc1/wVCt0n7k3gY2bm2ovyAnD+2eZGr0drKi+fnFVtzvIdeafI/WzaOV4lzR/DyXnITPICdjF5NvcGov9y6T8DFt6/T5YD3a/ydf2v92mXcvapVvTK4FfQ65Ru/jlebZWGoWH50e8iX+l1G/d46nkW9gPL5V9mbyQ1emk5sf7NPP5bqKcbf353PJV95uIPeK8VryUwV/T8WHAbGCKx7kRPdG8on1UeSa+VPJJzyddudW5mGThD+PfFztss38eGzrn6gU66rkIB8n37tQ8yE7K7NPWp/8bILb6KDLznGZ3q4D6MMCbDdH2JZck/wL8oN3DiLX3H2nDF9EqUXpQxxNkrWyNTdd7lD+hlxTcwe5JqSdrL6CfDn/iGG+137QTq0bWJ9EbuPZu6G+iuWfcLkBw1+y6uuJwqrMT3JN+BXkNnib9q7TgzA/R/h+9eR7ULb1nphGvIpQ3j+ZXCP/Q/IDv15Jvrnpv8i1YJcAX6w4D1dYs9jli5xc3w2c21P+BfIVozvKOvsPHcY4Ws3i1uQrhNMZ6ur2RnJt8tep8ETYVkyj1S5uRX4A0Hrkfpa/TL7n42clzh26XhdKnO0kfPdWeRc3BU6IbZ3VuHpEB4+XX5l9UpmPA/eEyzFPa9cBrMZCGq1m8VXk9muPI7ddO498uedm8llpc5llAeWR9OMcWxPHhKm5IXeJ9AD5aXGHkZOufyrDnlwOGssl3z3/o+o09G6o5ET7T8PFWdaFapesVnZ+9uz0ji477tqX/8Y0P6l8kjjI23pPTKNdRdgfOK+8fzq5+c6t5JOGixm6ofVbwLHN9yvN34GqWeyJbU6Zp5dQEi5yzewfyF11vpTcp/uvGKHLzj7HN5aaxTN6vrMF5epc5VhXVLv4mdb6OoWhp+9W7zlmBdPRXl+f31EME2JbZxVzkK63/0HeJ43rdHYdwCounLHULL615zszaJ3Nkbsr+w09N26NY4wDX3PTiukN5HZ/+7fKjiXX3DwR+AuGHiIwUBtBz4b6l80yHm6HRqVLVqs6P3veV7/8t7Lzs1I8A7+tt35nRVcR3the1mV93KRV9jFy++ou+qkemJrFYWJrTgy/S+5i8i6WrcWdQU4y3lc5rrHWLDYni53vOxlj7eIgLPcVTMdc8sOCrqF0K1z59wd+W2cC5SCjLOOB3CeN2zR2HcAqLJSVrVnsfcjB9uTLa/9NH7soYsBrblpx7k6+OejYnvIryDcEPlgOJMd2Ed8Yp6G9obZ3hsMl4X29ZLW685Nlk/Dql/9Wdn72OY4Jsa33/OZYryK0k51dyScXt9eKcwzL/fldxTFKbAvIzyA4spmH5D7yNyZf5h+1mdQ4xzNhrm6uYDlP2NrFEvd5dNTt3KBv60yQHGQF0zCw+6Rxmb6uA1jJhbHaNbXkhyIcSJ9rw8pvDWTNTU+Mc8vB6xKGumo7j9yWdi/yww2+Qb6E9pddrwMrmI7vkGsldp/I83MQDoZdz8+Jtq2PMO/GfBWBfIIxCH1Bd1qzuILYtiEn4T8A9uhZL35FpUSMCV6z2FrOE752kZ6HvnU8HwduW58IOcgY5/FA7pNWe9q6DmAlFsLuTMCaWgas5maEGJuN9PvkZg+L2gczcm3jw3TQB+wqTEdzk1MnD91wfo7L707Ibb0n1jFdRWAATraGib3TmsWVmK87k5/Y+b9UvHLAGlCzOMy8fH7X8UzU16Bv6xMhBxnDNAzsPmm1pqvrAFZiAUzYmloGpOZmDPP3P8ndYh1Yypq2i9uSbwrcp+s4xzAd25H7qu60H2Dn52rPuwm5rQ8zHZ1flVnF2DutWRzDfL2I3KvRg5SnclaOYcLXLPbMyzWudrGD+Tiw2/pEyEHGMA0Du09a1Vdzt/aEEBHbAp8j90c8mfy0wJenlG4rw59M7t7ptSmls7qKczgRMZfc7+v6wDuB55NX/ueklH7WZWyNiNgGOIU8Xz+UUrokIgK4gHym/IKU0sNdxrgyImJSl/E6P1frtybstt5WpuPT5C7oDkop/bTjkNYIZfl/nJzgXtdRDHPJ6+hzgWNSSh8r2/dk8n7+m+Qn7n6xi/jGKiK2Iz9o513N9qWVN+jb+kTIQR5r1uo6gJWRUlpM7uP3z8AO5Mdf3xYRzXQ8TO7q57cdhTiilNLN5NjvJ3dF9GFyjcPArPgppV+SY/wT8A8R8QLyg0yeBLwwpfRwREzqMsaV0XVy6/xcrd+asNt6W5mOd5Cb8vy843DWGCmlm4BXdJV8lxhuJj/c6UfAX0XEHin7M/khZVuSr4INtJTSjeSrdLd1HctENujb+kTIQR5rJlQNeGMi1ywOQs3NirTO5PciX/Z/akrpzxExOaX0ULfRTTzOz1U3kbf14XR9VUbjz5pFDWdQt/WJkIM8VkzIBByW2emtAxwDHEGuKduhJDcDufIDRMTapZZkYJWN9HDgnSmlh0wWV4/zc9VN5G1djw1lHT2JfH/C+uTke1G3UUnDmwg5yGPBhE3AwZrFWpyf48v5ufLc1jXorFmUtDImdAIO1ixKjxVu6xp01ixKGqsJn4C3eUCWHhvc1iVJE9kalYBLkiRJg25CdUMoSZIkTXQm4JIkSVJFJuCSJElSRSbgkiRJUkUm4JIkImJqRLyl6zgk6bHABFySBDAVMAGXpAomdx2AJGkgnABsExFXAZcCTwOmAWsDH0gpnQ8QER8EXg0sBW4HFqWUPtlNyJI0MZmAS5IAjgJ2SCntFBGTgfVTSvdHxGbAZRFxATAPeDmwIzkx/ymwqLOIJWmCMgGXJPUK4CMR8TzgEWALYAawG3B+SulPwJ8i4sIOY5SkCcsEXJLU62BgOvDMlNKfI+I2YN1uQ5KkNYc3YUqSAB4ANizvNwbuLsn3C4CtSvmPgX0jYt2I2AB4SQdxStKEZw24JImU0j0R8eOIuBa4EtguIq4BFgI3lnGuLG3BrwbuAq4Bft9VzJI0UUVKqesYJEkTRERskFL6Q0SsD/wAOCyl9NOu45KkicQacEnSyjg1IrYntwk/0+RbklaeNeCSJElSRd6EKUmSJFVkAi5JkiRVZAIuSZIkVWQCLkmSJFVkAi5JkiRVZAIuSZIkVfT/AWVkrd0JMpQaAAAAAElFTkSuQmCC\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "# observation \n", "- As we can see from the above chart,the classes are unbalanced. Geographical entity, time indicator, organizations and persons are heavily represented." ], "metadata": { "id": "kGxYry7F1rtz" } }, { "cell_type": "code", "source": [ "b=[]\n", "for i in range(5):\n", " a = data[data['sentence_idx'] == i+1]['word']\n", " b.append(' '.join(a))\n", "b[0].split('.')\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "D5ERpStz-R_1", "outputId": "afc5fa91-dd9e-45cd-c81e-c6d6ae367fbe" }, "execution_count": 20, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country ',\n", " ' Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country ',\n", " '']" ] }, "metadata": {}, "execution_count": 20 } ] }, { "cell_type": "code", "source": [ "# concat words and build sentences\n", "sentences=[]\n", "def concat_words(df):\n", " for i in df['sentence_idx'].unique():\n", " sent = df[df['sentence_idx'] == i]['word']\n", " sentences.append(' '.join(sent))\n", "\n", " return sentences\n", "sentences = concat_words(ner_data)\n" ], "metadata": { "id": "Nls2VWIB-TBl" }, "execution_count": 21, "outputs": [] }, { "cell_type": "code", "source": [ "len(sentences)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dg4QOS6lGGZy", "outputId": "08fffd9f-0b0f-48f5-80cc-a420c6515ea1" }, "execution_count": 22, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "35177" ] }, "metadata": {}, "execution_count": 22 } ] }, { "cell_type": "code", "source": [ "sentences[0:3]" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "lW2ahsK1Ghw2", "outputId": "7b6a207f-908a-44d1-a29d-d14917e9d930" }, "execution_count": 23, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country . Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country .',\n", " 'Families of soldiers killed in the conflict joined the protesters who carried banners with such slogans as \" Bush Number One Terrorist \" and \" Stop the Bombings . \" Families of soldiers killed in the conflict joined the protesters who carried banners with such slogans as \" Bush Number One Terrorist \" and \" Stop the Bombings . \"',\n", " 'They marched from the Houses of Parliament to a rally in Hyde Park . They marched from the Houses of Parliament to a rally in Hyde Park .']" ] }, "metadata": {}, "execution_count": 23 } ] }, { "cell_type": "code", "source": [ "# convert into dataframe\n", "df_sentences = pd.DataFrame(sentences)\n", "df_sentences.rename(columns={0:'sentences'},inplace=True)\n", "df_sentences.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "Ml80ma0PG7YY", "outputId": "7bd1201a-56d4-4111-b7dd-03b55abfd2e9" }, "execution_count": 24, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " sentences\n", "0 Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country . ...\n", "1 Families of soldiers killed in the conflict joined the protesters who carried banners with such slogans as \" Bush Number One Terrorist \" and \" Sto...\n", "2 They marched from the Houses of Parliament to a rally in Hyde Park . They marched from the Houses of Parliament to a rally in Hyde Park .\n", "3 Police put the number of marchers at 10,000 while organizers claimed it was 1,00,000 . Police put the number of marchers at 10,000 while organizer...\n", "4 The protest comes on the eve of the annual conference of Britain 's ruling Labor Party in the southern English seaside resort of Brighton . The pr..." ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sentences
0Thousands of demonstrators have marched through London to protest the war in Iraq and demand the withdrawal of British troops from that country . ...
1Families of soldiers killed in the conflict joined the protesters who carried banners with such slogans as \" Bush Number One Terrorist \" and \" Sto...
2They marched from the Houses of Parliament to a rally in Hyde Park . They marched from the Houses of Parliament to a rally in Hyde Park .
3Police put the number of marchers at 10,000 while organizers claimed it was 1,00,000 . Police put the number of marchers at 10,000 while organizer...
4The protest comes on the eve of the annual conference of Britain 's ruling Labor Party in the southern English seaside resort of Brighton . The pr...
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 24 } ] }, { "cell_type": "markdown", "source": [ "## Data cleansing" ], "metadata": { "id": "11pAAgaBDtfe" } }, { "cell_type": "code", "source": [ "def remove_special_char(df):\n", " special_char = list(punctuation)\n", " for e in ['.','?']:\n", " special_char.remove(e)\n", " special_char.append(\"\\n+\")\n", " special_char.append(\"\\s+\")\n", " special_char.append(\"said\")\n", " special_char.append(\"says\")\n", " special_char.append(\"say\")\n", " special_char.append(\"mr\")\n", "\n", " def deep_clean(sentence):\n", " sentence = str(sentence)\n", " sentence =sentence.strip()\n", " sentence = re.sub('<[^>]*>', '', sentence)\n", " for char in special_char:\n", " sentence = sentence.replace(char, '')\n", " return sentence\n", "\n", " df['sentences'] = df['sentences'].apply(deep_clean)\n", " return df" ], "metadata": { "id": "MeGBdMriVbb2" }, "execution_count": 25, "outputs": [] }, { "cell_type": "code", "source": [ "df_sentences = remove_special_char(df_sentences)" ], "metadata": { "id": "ndoFFfu5tkq8" }, "execution_count": 26, "outputs": [] }, { "cell_type": "markdown", "source": [ "## The distribution of word count in the sentences" ], "metadata": { "id": "kC6tPSqyMxY9" } }, { "cell_type": "code", "source": [ "df_sentences['word_count'] = df_sentences['sentences'].apply(lambda x: len(x.split()))" ], "metadata": { "id": "Ncq5ZSsYM5Wo" }, "execution_count": 27, "outputs": [] }, { "cell_type": "code", "source": [ "df_sentences['word_count'].describe([0.1,0.25,0.5,0.75,0.95])" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eABFHtEiNiU1", "outputId": "9e3a31fd-7390-43b0-d039-3bff5fc65738" }, "execution_count": 28, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "count 35177.000000\n", "mean 28.302499\n", "std 14.566342\n", "min 1.000000\n", "10% 13.000000\n", "25% 18.000000\n", "50% 25.000000\n", "75% 36.000000\n", "95% 58.000000\n", "max 130.000000\n", "Name: word_count, dtype: float64" ] }, "metadata": {}, "execution_count": 28 } ] }, { "cell_type": "code", "source": [ "df_sentences[df_sentences['word_count']<6]['sentences'].count()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "74Wk77BON7gA", "outputId": "908f5896-9eb3-4db6-fd42-5bf83ee86c1b" }, "execution_count": 29, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "101" ] }, "metadata": {}, "execution_count": 29 } ] }, { "cell_type": "code", "source": [ "df_sentences[df_sentences['word_count']<6]['sentences'].head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "uf81OifmNyrt", "outputId": "a24add60-2ea7-4910-b29e-64f91803ff45" }, "execution_count": 30, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "1594 John Garang John Garang\n", "2491 IRAQPOVERTY Washington IRAQPOVERTY Washington \n", "4809 Janice Karpinski Janice Karpinski\n", "8411 The The\n", "12943 The assassination occurred Tuesday .\n", "Name: sentences, dtype: object" ] }, "metadata": {}, "execution_count": 30 } ] }, { "cell_type": "code", "source": [ "df_sentences[df_sentences['word_count']>100]['sentences'].count()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "S09nSgudRbLX", "outputId": "83ded79d-2d86-4c77-ce83-0b5fa82367de" }, "execution_count": 31, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "12" ] }, "metadata": {}, "execution_count": 31 } ] }, { "cell_type": "code", "source": [ "sns.histplot(df_sentences['word_count'],\n", " bins=10)\n", "\n", "\n" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 297 }, "id": "9F2gx419NGqZ", "outputId": "d051663f-d64f-47b8-a5ac-c24bdf964d4f" }, "execution_count": 32, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 32 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAEHCAYAAABvHnsJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAafklEQVR4nO3de7RedX3n8fdHUvBWTZCUwSRO0iG1KzJVMSpi7Sh0QaCOobOowDglWmqcFlvtRQt1TWm1rFWmTmlpKzUDKdDFcCnFkrYozSBeZpRgEOUq5RSqScrl1AC2OkKD3/lj/1KfhnOSwybP85xD3q+1nnX2/u7ffvZvb3LOh31PVSFJUh/PGncHJElzlyEiSerNEJEk9WaISJJ6M0QkSb3NG3cHRu2ggw6qpUuXjrsbkjSn3Hzzzf9QVQt3re9zIbJ06VI2b9487m5I0pyS5KtT1T2cJUnqzRCRJPVmiEiSejNEJEm9DS1EkqxP8lCS23ep/1ySryS5I8l/H6ifmWQiyd1Jjh2or2q1iSRnDNSXJdnU6lck2X9Y6yJJmtow90QuAlYNFpK8CVgNvLyqXgZ8uNVXACcDL2vzfCTJfkn2A/4QOA5YAZzS2gKcA5xbVYcCDwOnDXFdJElTGFqIVNVngO27lH8G+K2qeqy1eajVVwOXV9VjVXUfMAG8pn0mqureqnocuBxYnSTAUcBVbf6LgROGtS6SpKmN+pzIDwBvaIehPp3k1a2+CNgy0G5rq01XfxHwSFXt2KU+pSRrk2xOsnlycnIvrYokadQhMg84EDgCeB9wZdurGKqqWldVK6tq5cKFT7rhUpLU06hDZCtwdXVuAr4DHARsA5YMtFvcatPVvw7MTzJvl/oz0qIlLyHJWD6Llrxk3KsvaRYb9WNP/hx4E3BDkh8A9gf+AdgA/K8kvwO8GFgO3AQEWJ5kGV1InAz856qqJDcAJ9KdJ1kDXDPidRmZv9+6hZM++rmxLPuKdx05luVKmhuGFiJJLgPeCByUZCtwFrAeWN8u+30cWFPd+3nvSHIlcCewAzi9qp5o3/Nu4DpgP2B9Vd3RFvErwOVJfhO4BbhwWOsiSZra0EKkqk6ZZtJ/mab92cDZU9SvBa6don4v3dVbkqQx8Y51SVJvhogkqTdDRJLUmyEiSerNEJEk9WaISJJ6M0QkSb0ZIpKk3gwRSVJvhogkqTdDRJLUmyEiSerNEJEk9WaISJJ6M0QkSb0ZIpKk3gwRSVJvQwuRJOuTPNRehbvrtF9KUkkOauNJcl6SiSS3Jjl8oO2aJPe0z5qB+quS3NbmOS9JhrUukqSpDXNP5CJg1a7FJEuAY4CvDZSPA5a3z1rg/Nb2QLp3s7+W7lW4ZyVZ0OY5H3jnwHxPWpYkabiGFiJV9Rlg+xSTzgXeD9RAbTVwSXVuBOYnOQQ4FthYVdur6mFgI7CqTXtBVd1YVQVcApwwrHWRJE1tpOdEkqwGtlXVl3eZtAjYMjC+tdV2V986RX265a5NsjnJ5snJyaexBpKkQSMLkSTPBX4V+LVRLXOnqlpXVSurauXChQtHvXhJesYa5Z7IvwOWAV9O8nfAYuCLSf4NsA1YMtB2cavtrr54irokaYRGFiJVdVtVfV9VLa2qpXSHoA6vqgeADcCp7SqtI4BHq+p+4DrgmCQL2gn1Y4Dr2rRvJDmiXZV1KnDNqNZFktQZ5iW+lwGfB16aZGuS03bT/FrgXmAC+J/AzwJU1XbgQ8AX2ueDrUZrc0Gb52+Bjw9jPSRJ05s3rC+uqlP2MH3pwHABp0/Tbj2wfor6ZuCwp9dLSdLT4R3rkqTeDBFJUm+GiCSpN0NEktSbISJJ6s0QkST1ZohIknozRCRJvRkikqTeDBFJUm+GiCSpN0NEktSbISJJ6s0QkST1ZohIknozRCRJvRkikqTehvl63PVJHkpy+0Dtt5N8JcmtST6WZP7AtDOTTCS5O8mxA/VVrTaR5IyB+rIkm1r9iiT7D2tdJElTG+aeyEXAql1qG4HDquqHgL8BzgRIsgI4GXhZm+cjSfZLsh/wh8BxwArglNYW4Bzg3Ko6FHgY2N073CVJQzC0EKmqzwDbd6n9dVXtaKM3Aovb8Grg8qp6rKruAyaA17TPRFXdW1WPA5cDq5MEOAq4qs1/MXDCsNZFkjS1cZ4T+Sng4214EbBlYNrWVpuu/iLgkYFA2lmfUpK1STYn2Tw5ObmXui9JGkuIJPkAsAO4dBTLq6p1VbWyqlYuXLhwFIuUpH3CvFEvMMnbgTcDR1dVtfI2YMlAs8WtxjT1rwPzk8xreyOD7SVJIzLSPZEkq4D3A2+pqm8NTNoAnJzkgCTLgOXATcAXgOXtSqz96U6+b2jhcwNwYpt/DXDNqNZDktQZ5iW+lwGfB16aZGuS04A/AL4X2JjkS0n+CKCq7gCuBO4EPgGcXlVPtL2MdwPXAXcBV7a2AL8C/GKSCbpzJBcOa10kSVMb2uGsqjplivK0f+ir6mzg7Cnq1wLXTlG/l+7qLUnSmHjHuiSpN0NEktSbISJJ6s0QkST1ZohIknozRCRJvRkikqTeDBFJUm+GiCSpN0NEktSbISJJ6s0QkST1ZohIknozRCRJvRkikqTeDBFJUm+GiCSpt2G+Hnd9koeS3D5QOzDJxiT3tJ8LWj1JzksykeTWJIcPzLOmtb8nyZqB+quS3NbmOS9JhrUukqSpDXNP5CJg1S61M4Drq2o5cH0bBzgOWN4+a4HzoQsd4CzgtXSvwj1rZ/C0Nu8cmG/XZUmShmxoIVJVnwG271JeDVzchi8GThioX1KdG4H5SQ4BjgU2VtX2qnoY2AisatNeUFU3VlUBlwx8lyRpREZ9TuTgqrq/DT8AHNyGFwFbBtptbbXd1bdOUZckjdDYTqy3PYgaxbKSrE2yOcnmycnJUSxSkvYJow6RB9uhKNrPh1p9G7BkoN3iVttdffEU9SlV1bqqWllVKxcuXPi0V0KS1Bl1iGwAdl5htQa4ZqB+artK6wjg0XbY6zrgmCQL2gn1Y4Dr2rRvJDmiXZV16sB3SZJGZN6wvjjJZcAbgYOSbKW7yuq3gCuTnAZ8FXhra34tcDwwAXwLeAdAVW1P8iHgC63dB6tq58n6n6W7Auw5wMfbR5I0QkMLkao6ZZpJR0/RtoDTp/me9cD6KeqbgcOeTh8lSU+Pd6xLknozRCRJvRkikqTeDBFJUm8zCpEkr59JTZK0b5npnsjvz7AmSdqH7PYS3ySvA44EFib5xYFJLwD2G2bHJEmz357uE9kfeH5r970D9W8AJw6rU5KkuWG3IVJVnwY+neSiqvrqiPokSZojZnrH+gFJ1gFLB+epqqOG0SlJ0tww0xD5U+CPgAuAJ4bXHUnSXDLTENlRVecPtSeSpDlnppf4/kWSn01ySJIDd36G2jNJ0qw30z2Rne8Aed9ArYDv37vdkSTNJTMKkapaNuyOSJLmnhmFSJJTp6pX1SV7tzuSpLlkpoezXj0w/Gy6F0t9ETBEJGkfNqMT61X1cwOfdwKH093J3kuSX0hyR5Lbk1yW5NlJliXZlGQiyRVJ9m9tD2jjE2360oHvObPV705ybN/+SJL66fso+G8Cvc6TJFkE/DywsqoOo3sG18nAOcC5VXUo8DBwWpvlNODhVj+3tSPJijbfy4BVwEeS+DwvSRqhmT4K/i+SbGifvwLuBj72NJY7D3hOknnAc4H7gaOAq9r0i4ET2vDqNk6bfnSStPrlVfVYVd0HTACveRp9kiQ9RTM9J/LhgeEdwFeramufBVbVtiQfBr4G/D/gr4GbgUeqakdrthVY1IYXAVvavDuSPAq8qNVvHPjqwXkkSSMw03Minwa+Qvck3wXA430XmGQB3V7EMuDFwPPoDkcNTZK1STYn2Tw5OTnMRT3zPGseSUb+WbTkJeNec0kzMNNLfN8K/DbwKSDA7yd5X1VdtdsZp/ajwH1VNdm++2rg9cD8JPPa3shiYFtrvw1YAmxth79eCHx9oL7T4Dz/SlWtA9YBrFy5snr0ed/1nR2c9NHPjXyxV7zryJEvU9JTN9MT6x8AXl1Va6rqVLpzD/+t5zK/BhyR5Lnt3MbRwJ3ADXz3HSVrgGva8Aa+e8f8icAnq6pa/eR29dYyYDlwU88+SZJ6mOk5kWdV1UMD41+n55VdVbUpyVV095nsAG6h20v4K+DyJL/Zahe2WS4E/iTJBLCd7oosquqOJFfSBdAO4PSq8gnDkjRCMw2RTyS5DrisjZ8EXNt3oVV1FnDWLuV7meLqqqr6NvAT03zP2cDZffshSXp69vSO9UOBg6vqfUn+E/DDbdLngUuH3TlJ0uy2pz2R3wXOBKiqq4GrAZL8+zbtPw61d5KkWW1P5zUOrqrbdi222tKh9EiSNGfsKUTm72bac/ZmRyRJc8+eQmRzknfuWkzy03R3mUuS9mF7OifyXuBjSd7Gd0NjJbA/8OPD7JgkafbbbYhU1YPAkUneBBzWyn9VVZ8ces8kSbPeTF+PewPdHeWSJP2Lvu8TkSTJEJEk9WeISJJ6M0QkSb0ZIpKk3gwRSVJvhogkqTdDRJLUmyEiSeptLCGSZH6Sq5J8JcldSV6X5MAkG5Pc034uaG2T5LwkE0luTXL4wPesae3vSbJm+iVKkoZhXHsivwd8oqp+EHg5cBdwBnB9VS0Hrm/jAMcBy9tnLXA+QJID6V6x+1q61+qetTN4JEmjMfIQSfJC4EeACwGq6vGqegRYDVzcml0MnNCGVwOXVOdGYH6SQ4BjgY1Vtb2qHgY2AqtGuCqStM8bx57IMmAS+OMktyS5IMnz6N6ieH9r8wBwcBteBGwZmH9rq01XlySNyDhCZB5wOHB+Vb0S+CbfPXQFQFUVUHtrgUnWJtmcZPPk5OTe+lpJ2ueNI0S2AluralMbv4ouVB5sh6loPx9q07cBSwbmX9xq09WfpKrWVdXKqlq5cOHCvbYikrSvG3mIVNUDwJYkL22lo4E7gQ3Azius1gDXtOENwKntKq0jgEfbYa/rgGOSLGgn1I9pNUnSiMzopVRD8HPApUn2B+4F3kEXaFcmOQ34KvDW1vZa4HhgAvhWa0tVbU/yIeALrd0Hq2r76FZBkjSWEKmqL9G9q31XR0/RtoDTp/me9cD6vds7SdJMece6JKk3Q0SS1JshIknqzRCRJPVmiEiSejNEJEm9GSKSpN4MEUlSb4aIJKk3Q0SS1JshIknqzRCRJPVmiEiSehvXo+Cl3XvWPJKMZdEvXryEbVu+NpZlS3ONIaLZ6Ts7OOmjnxvLoq9415FjWa40F3k4S5LUmyEiSeptbCGSZL8ktyT5yza+LMmmJBNJrmivziXJAW18ok1fOvAdZ7b63UmOHc+aSNK+a5x7Iu8B7hoYPwc4t6oOBR4GTmv104CHW/3c1o4kK4CTgZcBq4CPJNlvRH2XJDGmEEmyGPgx4II2HuAo4KrW5GLghDa8uo3Tph/d2q8GLq+qx6rqPmACeM1o1kCSBOPbE/ld4P3Ad9r4i4BHqmpHG98KLGrDi4AtAG36o639v9SnmEeSNAIjD5EkbwYeqqqbR7jMtUk2J9k8OTnZ+3sWLXkJSUb+kaTZahz3ibweeEuS44FnAy8Afg+Yn2Re29tYDGxr7bcBS4CtSeYBLwS+PlDfaXCef6Wq1gHrAFauXFl9O/73W7eM5d4F71uQNFuNfE+kqs6sqsVVtZTuxPgnq+ptwA3Aia3ZGuCaNryhjdOmf7KqqtVPbldvLQOWAzeNaDUkScyuO9Z/Bbg8yW8CtwAXtvqFwJ8kmQC20wUPVXVHkiuBO4EdwOlV9cTouy1J+66xhkhVfQr4VBu+lymurqqqbwM/Mc38ZwNnD6+HkqTd8Y51SVJvhogkqTdDRJLUmyEiSerNEJEk9WaISJJ6M0QkSb0ZIpKk3gwRSVJvhogkqTdDRJLUmyEiSerNEJEk9WaISJJ6M0QkSb0ZIpKk3gwRSVJvhogkqbeRh0iSJUluSHJnkjuSvKfVD0yyMck97eeCVk+S85JMJLk1yeED37Wmtb8nyZpRr4sk7evGsSeyA/ilqloBHAGcnmQFcAZwfVUtB65v4wDHAcvbZy1wPnShA5wFvJbu3exn7QweSdJojDxEqur+qvpiG/5H4C5gEbAauLg1uxg4oQ2vBi6pzo3A/CSHAMcCG6tqe1U9DGwEVo1wVSRpnzfWcyJJlgKvBDYBB1fV/W3SA8DBbXgRsGVgtq2tNl19quWsTbI5yebJycm91n89Qz1rHklG/lm05CXjXnPpKZs3rgUneT7wZ8B7q+obSf5lWlVVktpby6qqdcA6gJUrV+6179Uz1Hd2cNJHPzfyxV7xriNHvkzp6RrLnkiS76ELkEur6upWfrAdpqL9fKjVtwFLBmZf3GrT1SVJIzKOq7MCXAjcVVW/MzBpA7DzCqs1wDUD9VPbVVpHAI+2w17XAcckWdBOqB/TapKkERnH4azXAz8J3JbkS632q8BvAVcmOQ34KvDWNu1a4HhgAvgW8A6Aqtqe5EPAF1q7D1bV9tGsgiQJxhAiVfV/gEwz+egp2hdw+jTftR5Yv/d6J0l6KrxjXZLUmyEiSerNEJEk9WaISJJ6M0QkSb0ZIpKk3gwRSVJvhogkqTdDRJLUmyEiSerNEJEk9Ta294lI2kV7GdY4vHjxErZt+dpYlq25zRCRZosxvQwLfCGW+vNwliSpN0NEktSbISJJ6s0QkST1NudDJMmqJHcnmUhyxrj7I0n7kjkdIkn2A/4QOA5YAZySZMV4eyVJ+445HSLAa4CJqrq3qh4HLgdWj7lP0tzT7lEZ9WfRkpeMe831NKWqxt2H3pKcCKyqqp9u4z8JvLaq3r1Lu7XA2jb6UuDup7iog4B/eJrdHRf7Ph72fTzmct9hdvf/31bVwl2L+8TNhlW1DljXd/4km6tq5V7s0sjY9/Gw7+Mxl/sOc7P/c/1w1jZgycD44laTJI3AXA+RLwDLkyxLsj9wMrBhzH2SpH3GnD6cVVU7krwbuA7YD1hfVXcMYVG9D4XNAvZ9POz7eMzlvsMc7P+cPrEuSRqvuX44S5I0RoaIJKk3Q2Q35tIjVZIsSXJDkjuT3JHkPa1+YJKNSe5pPxeMu6/TSbJfkluS/GUbX5ZkU9v+V7SLJ2alJPOTXJXkK0nuSvK6ubLtk/xC+zdze5LLkjx7tm77JOuTPJTk9oHalNs5nfPaOtya5PDx9Xzavv92+zdza5KPJZk/MO3M1ve7kxw7nl7vmSEyjTn4SJUdwC9V1QrgCOD01t8zgOurajlwfRufrd4D3DUwfg5wblUdCjwMnDaWXs3M7wGfqKofBF5Otx6zftsnWQT8PLCyqg6ju0DlZGbvtr8IWLVLbbrtfBywvH3WAuePqI/TuYgn930jcFhV/RDwN8CZAO1392TgZW2ej7S/SbOOITK9OfVIlaq6v6q+2Ib/ke6P2CK6Pl/cml0MnDCeHu5eksXAjwEXtPEARwFXtSazue8vBH4EuBCgqh6vqkeYI9ue7irN5ySZBzwXuJ9Zuu2r6jPA9l3K023n1cAl1bkRmJ/kkNH09Mmm6ntV/XVV7WijN9Ld6wZd3y+vqseq6j5ggu5v0qxjiExvEbBlYHxrq816SZYCrwQ2AQdX1f1t0gPAwWPq1p78LvB+4Dtt/EXAIwO/YLN5+y8DJoE/bofjLkjyPObAtq+qbcCHga/RhcejwM3MnW0P02/nufY7/FPAx9vwnOm7IfIMk+T5wJ8B762qbwxOq+567ll3TXeSNwMPVdXN4+5LT/OAw4Hzq+qVwDfZ5dDVLN72C+j+r3cZ8GLgeTz5kMucMVu3854k+QDdIelLx92Xp8oQmd6ce6RKku+hC5BLq+rqVn5w5y58+/nQuPq3G68H3pLk7+gOGx5Fd45hfjvEArN7+28FtlbVpjZ+FV2ozIVt/6PAfVU1WVX/DFxN999jrmx7mH47z4nf4SRvB94MvK2+e+PenOg7GCK7M6ceqdLOIVwI3FVVvzMwaQOwpg2vAa4Zdd/2pKrOrKrFVbWUbjt/sqreBtwAnNiazcq+A1TVA8CWJC9tpaOBO5kD257uMNYRSZ7b/g3t7Puc2PbNdNt5A3Bqu0rrCODRgcNes0KSVXSHcd9SVd8amLQBODnJAUmW0V0ccNM4+rhHVeVnmg9wPN0VE38LfGDc/dlDX3+Ybjf+VuBL7XM83bmF64F7gP8NHDjuvu5hPd4I/GUb/n66X5wJ4E+BA8bdv930+xXA5rb9/xxYMFe2PfAbwFeA24E/AQ6YrdseuIzu3M0/0+0BnjbddgZCd4Xl3wK30V2BNtv6PkF37mPn7+wfDbT/QOv73cBx497203187IkkqTcPZ0mSejNEJEm9GSKSpN4MEUlSb4aIJKk3Q0SS1JshIo1Qkrcn+YMxLv8VSY4f1/L1zGOISEM0Cx/f/Qq6m1ClvcIQkaaR5H1Jfr4Nn5vkk234qCSXJjklyW3tZU7nDMz3T0n+R5IvA69L8o4kf5PkJrrnUu1umQe3lxN9uX2ObPVfbMu5Pcl7W23pLi84+uUkv96GP5XknCQ3tWW/oT2+54PASUm+lOSkvbrBtE8yRKTpfRZ4QxteCTy/PeTyDXSPwzmH7mGRrwBenWTneyyeB2yqqpfTPbbiN+jC44fpXnC2O+cBn27zHg7ckeRVwDuA19K9cOydSV45g/7Pq6rXAO8FzqruvTi/BlxRVa+oqitm8B3Sbhki0vRuBl6V5AXAY8Dn6cLkDcAjwKeqe/rtzkd4/0ib7wm6pylD94d/Z7vHgT394T6K9ga+qnqiqh6lC5+PVdU3q+qf6J60+4bdfMdOO5/kfDOwdAbtpafMEJGmUd2j0e8D3g58jm7P5E3AocDf7WbWb1fVE8PuH937JwZ/h5+9y/TH2s8n6N55Iu11hoi0e58Ffhn4TBv+r8AtdE+4/Q9JDmonz08BPj3F/Jtauxe1Q2E/sYflXQ/8DHQn5durdz8LnNAe1/484Mdb7UHg+9p3H0D3Too9+Ufge2fQTpoRQ0Tavc8ChwCfr6oHgW8Dn63uvRRn0L1348vAzVX1pHdutHa/Tnco7P8Cd+1hee8B3pTkNrrDUCuq6ovARXTBtQm4oKpuaXtKH2z1jXSPc9+TG4AVnljX3uKj4CVJvbknIknqzZNt0hgk+QBPPj/yp1V19jj6I/Xl4SxJUm8ezpIk9WaISJJ6M0QkSb0ZIpKk3v4/xKX9f713AJ4AAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "- The word count of the sentence is between 1 and 130. \n", "- 75% of the data word count is 36. \n", "- There are 100 sentences their word count is less than 6 and 13 sentences\n", "their word count is above 100." ], "metadata": { "id": "xR6u2QloSQ8O" } }, { "cell_type": "markdown", "source": [ "## The distribution of top unigrams after removing stop words\n" ], "metadata": { "id": "pV2AVfGTc7dy" } }, { "cell_type": "code", "source": [ "def get_top_n_words(corpus, n=None, language=None):\n", " if language=='english':\n", " vec = CountVectorizer(stop_words = 'english').fit(corpus)\n", " else:\n", " vec = CountVectorizer().fit(corpus)\n", "\n", " bag_of_words = vec.transform(corpus)\n", " sum_words = bag_of_words.sum(axis=0) \n", " words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]\n", " words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)\n", " return words_freq[:n]\n" ], "metadata": { "id": "AyF-kwr-TXyM" }, "execution_count": 33, "outputs": [] }, { "cell_type": "code", "source": [ "common_words = get_top_n_words(df_sentences['sentences'], 20, 'english')\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['Word' , 'count'])\n", "df1.groupby('Word').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 695 }, "id": "MYRCv0MiXKEC", "outputId": "ac18bb44-8436-41dd-9cf1-2dc1f6d63e2b" }, "execution_count": 34, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "officials 3396\n", "president 3335\n", "mr 3106\n", "government 3015\n", "killed 2892\n", "people 2821\n", "new 2123\n", "united 2091\n", "military 2026\n", "country 1962\n", "police 1930\n", "minister 1836\n", "iraq 1820\n", "security 1683\n", "states 1546\n", "year 1494\n", "tuesday 1384\n", "group 1382\n", "forces 1337\n", "world 1333\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 34 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAE5CAYAAAB8sPArAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deZhcVZ3/8fcnYVPZgrSIQAhiFBEFISwKMyAMEFDEBZWoEKNOXEBRxlFw9BdFGdERGVdGliAgiHElIg5EZJE9CyGskQgoyYBEQRZBNPj9/XFO0Ted6q57qyrV3bmf1/PUU1W37jl1qrrre88921VEYGZm9TBmuAtgZma946BvZlYjDvpmZjXioG9mViMO+mZmNeKgb2ZWI2u12kHSesBVwLp5/x9GxAxJ3wH2Bh7Ju74rIhZKEvBV4GDgibx9Qc5rKvCpvP/nI+Lsod570003jQkTJlT+UGZmdTZ//vw/RkRfs9daBn3gKWDfiHhc0trA1ZJ+kV/794j44YD9DwIm5tvuwKnA7pI2AWYAk4AA5kuaHREPD/bGEyZMYN68eSWKaGZmDZJ+N9hrLZt3Ink8P10734aa0XUocE5Odz2wsaTNgQOBORHxUA70c4DJZT+EmZl1rlSbvqSxkhYCD5IC9w35pRMlLZJ0iqR187YtgPsKyZfmbYNtNzOzHikV9CPi6YjYCdgS2E3SDsDxwHbArsAmwCe6USBJ0yXNkzRv+fLl3cjSzMyySqN3IuLPwOXA5Ii4PzfhPAWcBeyWd1sGbFVItmXeNtj2ge9xWkRMiohJfX1N+yHMzKxNLYO+pD5JG+fHzwL2B+7M7fTk0TpvAG7NSWYDRyrZA3gkIu4HLgEOkDRO0jjggLzNzMx6pMzonc2BsyWNJR0kZkXERZJ+JakPELAQeH/e/2LScM0lpCGb0wAi4iFJnwPm5v1OiIiHuvdRzMysFY3kpZUnTZoUHrJpZlaNpPkRManZa56Ra2ZWI2Wad0aUCcf9fMjX7z3ptT0qiZnZ6OOavplZjTjom5nViIO+mVmNjLo2/U616hMA9wuY2ZrLNX0zsxpx0DczqxEHfTOzGnHQNzOrEQd9M7MacdA3M6sRB30zsxpx0DczqxEHfTOzGnHQNzOrEQd9M7MacdA3M6sRB30zsxqp3Sqb3eCrd5nZaOWavplZjTjom5nVSMugL2k9STdKulnSbZI+m7dvI+kGSUskfV/SOnn7uvn5kvz6hEJex+ftiyUduLo+lJmZNVempv8UsG9E7AjsBEyWtAfwReCUiHgR8DDwnrz/e4CH8/ZT8n5I2h44HHgZMBn4lqSx3fwwZmY2tJYduRERwOP56dr5FsC+wNvz9rOBzwCnAofmxwA/BL4hSXn7BRHxFHCPpCXAbsB13fggo407g81sOJRq05c0VtJC4EFgDvBb4M8RsSLvshTYIj/eArgPIL/+CPDc4vYmaYrvNV3SPEnzli9fXv0TmZnZoEoF/Yh4OiJ2ArYk1c63W10FiojTImJSREzq6+tbXW9jZlZLlUbvRMSfgcuBVwEbS2o0D20JLMuPlwFbAeTXNwL+VNzeJI2ZmfVAmdE7fZI2zo+fBewP3EEK/ofl3aYCF+bHs/Nz8uu/yv0Cs4HD8+iebYCJwI3d+iBmZtZamRm5mwNn55E2Y4BZEXGRpNuBCyR9HrgJODPvfyZwbu6ofYg0YoeIuE3SLOB2YAVwVEQ83d2PY2ZmQykzemcR8Mom2+8mte8P3P5X4C2D5HUicGL1YpqZWTd4Rq6ZWY14wbVRzGP9zawq1/TNzGrEQd/MrEYc9M3MasRt+jXWqk8A3C9gtqZxTd/MrEYc9M3MasRB38ysRhz0zcxqxEHfzKxGHPTNzGrEQd/MrEYc9M3MasRB38ysRhz0zcxqxEHfzKxGHPTNzGrEQd/MrEYc9M3MasRB38ysRloGfUlbSbpc0u2SbpN0TN7+GUnLJC3Mt4MLaY6XtETSYkkHFrZPztuWSDpu9XwkMzMbTJmLqKwA/i0iFkjaAJgvaU5+7ZSI+HJxZ0nbA4cDLwNeAPxS0ovzy98E9geWAnMlzY6I27vxQczMrLWWQT8i7gfuz48fk3QHsMUQSQ4FLoiIp4B7JC0BdsuvLYmIuwEkXZD3ddA3M+uRSm36kiYArwRuyJuOlrRI0kxJ4/K2LYD7CsmW5m2DbTczsx4pHfQlrQ/8CPhIRDwKnApsC+xEOhM4uRsFkjRd0jxJ85YvX96NLM3MLCsV9CWtTQr450XEjwEi4g8R8XRE/AM4nf4mnGXAVoXkW+Ztg21fSUScFhGTImJSX19f1c9jZmZDKDN6R8CZwB0R8ZXC9s0Lu70RuDU/ng0cLmldSdsAE4EbgbnAREnbSFqH1Nk7uzsfw8zMyigzemdP4AjgFkkL87ZPAlMk7QQEcC/wPoCIuE3SLFIH7QrgqIh4GkDS0cAlwFhgZkTc1sXPYmZmLZQZvXM1oCYvXTxEmhOBE5tsv3iodGZmtnp5Rq6ZWY046JuZ1YiDvplZjTjom5nViIO+mVmNOOibmdWIg76ZWY046JuZ1YiDvplZjTjom5nViIO+mVmNOOibmdWIg76ZWY046JuZ1YiDvplZjTjom5nViIO+mVmNOOibmdVImWvkmg1qwnE/H/L1e096bY9KYmZluKZvZlYjrunbsPPZglnvuKZvZlYjLYO+pK0kXS7pdkm3STomb99E0hxJd+X7cXm7JH1N0hJJiyTtXMhrat7/LklTV9/HMjOzZsrU9FcA/xYR2wN7AEdJ2h44DrgsIiYCl+XnAAcBE/NtOnAqpIMEMAPYHdgNmNE4UJiZWW+0bNOPiPuB+/PjxyTdAWwBHArsk3c7G7gC+ETefk5EBHC9pI0lbZ73nRMRDwFImgNMBr7Xxc9jNeV+AbNyKrXpS5oAvBK4AdgsHxAAHgA2y4+3AO4rJFuatw22feB7TJc0T9K85cuXVymemZm1UDroS1of+BHwkYh4tPhartVHNwoUEadFxKSImNTX19eNLM3MLCs1ZFPS2qSAf15E/Dhv/oOkzSPi/tx882DevgzYqpB8y7xtGf3NQY3tV7RfdLPuadU8BG4isjVDmdE7As4E7oiIrxRemg00RuBMBS4sbD8yj+LZA3gkNwNdAhwgaVzuwD0gbzMzsx4pU9PfEzgCuEXSwrztk8BJwCxJ7wF+B7w1v3YxcDCwBHgCmAYQEQ9J+hwwN+93QqNT12xN4M5kGw3KjN65GtAgL+/XZP8Ajhokr5nAzCoFNDOz7vGMXDOzGnHQNzOrEQd9M7MacdA3M6sRL61sNoJ4BJCtbg76ZmsYHzhsKG7eMTOrEQd9M7MacdA3M6sRB30zsxpx0DczqxEHfTOzGnHQNzOrEY/TN7OV+IIyazbX9M3MasRB38ysRhz0zcxqxEHfzKxGHPTNzGrEQd/MrEYc9M3MaqRl0Jc0U9KDkm4tbPuMpGWSFubbwYXXjpe0RNJiSQcWtk/O25ZIOq77H8XMzFopMznrO8A3gHMGbD8lIr5c3CBpe+Bw4GXAC4BfSnpxfvmbwP7AUmCupNkRcXsHZTezEcoXchm5Wgb9iLhK0oSS+R0KXBARTwH3SFoC7JZfWxIRdwNIuiDv66BvZtZDnbTpHy1pUW7+GZe3bQHcV9hnad422HYzM+uhdtfeORX4HBD5/mTg3d0okKTpwHSA8ePHdyNLMxuF3ES0erRV04+IP0TE0xHxD+B0+ptwlgFbFXbdMm8bbHuzvE+LiEkRMamvr6+d4pmZ2SDaCvqSNi88fSPQGNkzGzhc0rqStgEmAjcCc4GJkraRtA6ps3d2+8U2M7N2tGzekfQ9YB9gU0lLgRnAPpJ2IjXv3Au8DyAibpM0i9RBuwI4KiKezvkcDVwCjAVmRsRtXf80ZmY2pDKjd6Y02XzmEPufCJzYZPvFwMWVSmdmZl3lGblmZjXioG9mViO+XKKZrZG6cdnHNfHSka7pm5nViIO+mVmNuHnHzGw16sbM4m7OTnZN38ysRhz0zcxqxEHfzKxGHPTNzGrEQd/MrEYc9M3MasRB38ysRhz0zcxqxEHfzKxGHPTNzGrEQd/MrEYc9M3MasRB38ysRhz0zcxqxEHfzKxGHPTNzGqkZdCXNFPSg5JuLWzbRNIcSXfl+3F5uyR9TdISSYsk7VxIMzXvf5ekqavn45iZ2VDK1PS/A0wesO044LKImAhclp8DHARMzLfpwKmQDhLADGB3YDdgRuNAYWZmvdMy6EfEVcBDAzYfCpydH58NvKGw/ZxIrgc2lrQ5cCAwJyIeioiHgTmseiAxM7PVrN02/c0i4v78+AFgs/x4C+C+wn5L87bBtq9C0nRJ8yTNW758eZvFMzOzZjruyI2IAKILZWnkd1pETIqISX19fd3K1szMaD/o/yE325DvH8zblwFbFfbbMm8bbLuZmfVQu0F/NtAYgTMVuLCw/cg8imcP4JHcDHQJcICkcbkD94C8zczMemitVjtI+h6wD7CppKWkUTgnAbMkvQf4HfDWvPvFwMHAEuAJYBpARDwk6XPA3LzfCRExsHPYzMxWs5ZBPyKmDPLSfk32DeCoQfKZCcysVDozM+sqz8g1M6sRB30zsxpx0DczqxEHfTOzGnHQNzOrEQd9M7MacdA3M6sRB30zsxpx0DczqxEHfTOzGnHQNzOrEQd9M7MacdA3M6sRB30zsxpx0DczqxEHfTOzGnHQNzOrEQd9M7MacdA3M6sRB30zsxpx0Dczq5GOgr6keyXdImmhpHl52yaS5ki6K9+Py9sl6WuSlkhaJGnnbnwAMzMrrxs1/ddExE4RMSk/Pw64LCImApfl5wAHARPzbTpwahfe28zMKlgdzTuHAmfnx2cDbyhsPyeS64GNJW2+Gt7fzMwG0WnQD+BSSfMlTc/bNouI+/PjB4DN8uMtgPsKaZfmbWZm1iNrdZh+r4hYJul5wBxJdxZfjIiQFFUyzAeP6QDjx4/vsHhmZlbUUU0/Ipbl+weBnwC7AX9oNNvk+wfz7suArQrJt8zbBuZ5WkRMiohJfX19nRTPzMwGaDvoS3qOpA0aj4EDgFuB2cDUvNtU4ML8eDZwZB7FswfwSKEZyMzMeqCT5p3NgJ9IauRzfkT8r6S5wCxJ7wF+B7w1738xcDCwBHgCmNbBe5uZWRvaDvoRcTewY5PtfwL2a7I9gKPafT8zM+ucZ+SamdWIg76ZWY046JuZ1YiDvplZjTjom5nViIO+mVmNOOibmdWIg76ZWY046JuZ1YiDvplZjTjom5nViIO+mVmNOOibmdWIg76ZWY046JuZ1YiDvplZjTjom5nViIO+mVmNOOibmdWIg76ZWY046JuZ1YiDvplZjfQ86EuaLGmxpCWSjuv1+5uZ1VlPg76kscA3gYOA7YEpkrbvZRnMzOqs1zX93YAlEXF3RPwNuAA4tMdlMDOrLUVE795MOgyYHBHvzc+PAHaPiKML+0wHpuenLwEWt8h2U+CPHRat0zxGQhlGSh4joQzdyGMklGGk5DESyjBS8hgJZSiTx9YR0dfshbU6fOOui4jTgNPK7i9pXkRM6uQ9O81jJJRhpOQxEsrQjTxGQhlGSh4joQwjJY+RUIZO8+h1884yYKvC8y3zNjMz64FeB/25wERJ20haBzgcmN3jMpiZ1VZPm3ciYoWko4FLgLHAzIi4rcNsSzcFrcY8RkIZRkoeI6EM3chjJJRhpOQxEsowUvIYCWXoKI+eduSamdnw8oxcM7MacdA3M6sRB30zsxoZ1UFf0jhJrxiG9x0r6bwu5HNMmW29IOnZHaTdtptlWRO0+30q2ar1nms2SWMkvXq4y7EmGnUduZKuAF5PGnk0H3gQuCYijq2Qx2URsV+rbS3yuBrYNy8n0RZJCyJi5wHbboqIV7abZxtleDVwBrB+RIyXtCPwvoj4YIU8riTNuZgL/Bq4KiJuKZl2k6Fej4iHKpRjPjATOD8iHi6brpD+x8CZwC8i4h9V0+c8uvF93hIRL2/n/Qt5fG2o1yPiwy3Sn0yHo+skfQn4PPAk8L/AK4CPRsR3S6bv+LfQjc+R83k+aRmZAOZGxAMV0h4DnAU8RvrfeCVwXERcWiLt1/N7NtXq79jMaKzpbxQRjwJvAs6JiN2BfymTUNJ6Ochsms8SNsm3CcAWFctxN3CNpE9LOrZxK1mOKZJ+BmwjaXbhdjlQKshJekzSo4PdKnyOU4ADgT8BRMTNwD9XSE9E7A28FPg6sDHwc0llg/V8YF6+Xw78BrgrP55fpRzA24AXAHMlXSDpQEmqkP5bwNuBuySdJOklFd8fuvB9Agsk7drGexetB+xM+i7vAnYC1iF9p2W+1zuA0yTdIOn9kjZqowwH5N/q64B7gRcB/14h/WWS3lzxbzhQx59D0nuBG0kx5zDgeknvrpDFu/P3cAAwDjgCOKlk2sZvY7C/Z3URMapuwC3A5sClwK5526KSaY8B7gGeIgXte/LtZuDoiuWYkW//r3grmXZrYB/gOmDvwm1nYK2K5fgc8EFgA2BD4APACRXS35Dvbypsu7liGfYCjgcuBq4lBc8pFfM4HTi48Pwg4Ntt/o+MIZ0NLgN+D3wW2KRC+o2A9wP35c8zDVi7h9/nncAK4LfAovw/X+p/vJDH9cX/JWBt4Po2vsuXkALU74DzgddUSHtrvj+DtOZWpe+CVDP+B/A34NH8/NE2/yc6+RyLgecWnj8XWFwh/aJ8/1XgjQP/P3r594yIkbf2TgknkCZ3XR0RcyW9kHTkaykivgp8VdKHIuLrHZbjYuCTwAT6J7lFLl+rcvyO9M/3qg7LAPD6iNix8PxUSTeTDkJl3JebJELS2qQD4x0Vy3AFqTbyBeDiaK/Ja4+I+NfGk4j4RW4eqCT38UwDDgZ+BJxHOij9ilQ7apX+ucA7SbWxmwrpp5IO1K104/s8sOL+zYwjVQIaZ1zr522l5aXQt8u3P5IqR8dKel9EHF4ii4sk3Ulq3vmApD7gr2XfPyI2qFLewXThc/yJdMBpeCxvK2u+pEuBbYDjJW1AOphV0fHf8xntHCnWhBvwatKp/JGNW8X0i4FD8h9y68atYh5vIh2wHqHNmgypJvoO0gznMfnxtRXSb0oKbH8g9Y98l0KtpmQeGwOvBb5ICq6/BD5XMY9LgE+RDqITgP8ALqmYx3zgsvx3XXfAaz8ukf4nwO2ks5bNB7w2r1ffZ85nL2BaftwHbFMx/TRSxeI7wNmkM9qpFdKfkv83vw3sNvB/v0I+mwBj8+PnAM+v+DnGkdrS/7lxq5i+488BnEOqAHyGdHa/IH+vxwLHlkg/hnQWv3F+/lzgFb38exZvo6Yjt5sdGpLOBbYFFgJP92dRKY+rI2KvsvsPkscS4JCIqFoTLOYxgXTauCfp+7kG+EhE3NtJ2doox0tJTVT/RDqg/j5SW3/Z9JuQflD/TPocV5Gaqcr2cYwhdY79Z9WyF9J/MiI+3076bpI0A5gEvCQiXizpBcAPImLPivk8H9g9P70hqnU+TgNmRcRfmry2UUQ8UiKPZ5MC4/iImC5pIukzXVSyDO8lnSltSfqt7gFcFxH79vhzzBjq9Yj4bIv0IlXGXhgRJ0gaTzr43djqvQfk0/bfc6V8RlHQnzrU6xFxdoW87gC2jw4+vKT9gCmkmuVThXL8uEIe11T9IXdLlw+id5Paoa8mBesbo81RTZKe0+wHWjJtp8vmdmO0yNnAMRHx5/x8HHByRJTu+JO0kDTCY0GjPJIWRUSl4cn5vSeSOgEBiIirSqbtxgi375POvo6MiB3yQeDaiGjZzJbT3wLsSmq73knSdsB/RsSbypYh59P299ANkk4lNefsGxEvzeW5NCJadtZL2nmo1yNiQdXyjJo2/SpBvYRbgecD93eQxzRSG+Ha9LfPBVA66APz8g/jp7R/4HgxcCqwWf5hvYLUzt+qxjqvQjlbeVG0OcSxoTjUEWhrqCPwS0kfA74PPHPgKHu2QB4tQmoKardC8IpGwM/v/bCkqgeSv0VESApIB8KqhRislgwMWUuWtB7wbPIIN6AxcmZDqo9w2zYi3iZpCkBEPFFxJM5fI+KvkpC0bkTcWXVEVbvfw4A8LqdJBanCGcfuEbGzpJtyuoeVVhku4+QhXgsqfI6GURP0G3Jn0CdI19gtHrmrfPhNgdsl3cjKwfb1FfLYNSLaGdJXtCHwBGko1zPFoNqB43TSMLhvA0TEIknnk8ZHD2rgQVTShmlzPDZIkqG8KNdmqh54ihpDHWfn8t0sqepQx7fl+6MK2wJ4Ycn07yM1R6yQ9FdSwIuI2LBCGcZIGhd5nkButqr6O5sl6dvAxpL+FXg36YBYxTH015Jf06gll0j3PuAjpKGvxVrko8A3Kpbhb5KeRQ6YSpP4nho6yUqWStqYVCmaI+lhUrt2Fe1+D0UfKzxeD3gzaXRVWX/PncmN76GPkh25ucxjgFdFxDUV3nPITEfVjTRU8z2kERF7kybjfLFiHns3u1XM4yxSE9Fwfx9z831xiODCCuknkYYE3kv6Qd0M7FKxDFeSOtuKZbi1Yh4dD3UcCTfSoIA7SUNpP58fH9FGPvsD/wV8Gdi/g/+LheRObeC2Cuk/1IXv4oD8v7Gc1Ll9LxWGSg7Ia2/SMNx1evk9DJHvjRX2fQepMrMUOJE0COQtFd+v0hDPoW6jrqZPGglxpqRjIuJK4EpJc6tkEBFXStoamBgRv8xtjWMrlmMPYKGke0i1l0atsHS7awdNM0V/zDWoRi3iMKo1W80EPhgRv87p9yId0Kq0Hz87Im4ccOZepSYE3RnqiKQdWPUs8JySaTtux46IcyTNo/+0+00RcXvZ9Pk9vxgRnwDmNNlWVlu1ZEn7RsSvgGWSVmk7jwpNjxFxqdIs6T1Iv49jIqLStWHz/+PEiDgr15C3II1cKavjswWtPGt8DLALaS5HKRFxXv4e9iN9D2+I6oM3utH0+EyBRtWNPCGBNMTvtaQOr99WzONfSUsG/DY/nwhcVjGPrZvdKubRjRryC0lDJJ8gTUa6GphQIf0qNQhSB2KVMvyCNBpqQX5+GGkpgyp5NBvqWHpCVc5jBnB5zuMs4AHghyXSrUcaWngzaYjgJvk2Abiz5HtvmO83aXar+DlW+f6pODlrQNrStWTgs/n+rCa3mRXfd5XfVJXfWf57/gz4TX7+AtKSK6v9exiQ7h76J3PeRWpt2KtC+j2ADYr/K6R2/iplaExU+zudTlRr9wscrhtpSvdGwA75Bz6fVDuuksdC0hTmYrC9ZRg+S0dNMwPyek7xH6tCuv8m9Qfsk38U3wK+QhpXvHPJPJodeLauWI49y2xrkcctpJrYzfn5ZsCcEuk6nqkNXJTvGwHi7kJed5fM4wP5M/yFNBO3cbsH+G6F72EsJQ9Wg6QfA7y1g/QdH0RzPgtJNePi76Ps7PumB1/aOAh3eiON8deA77dSxaqbt1HXvBP9Y3wfAV7TZjZPRcTfGs0RktZiiOGLq1HbTTMaZJ2fxmeKiK+ULENjNu+MAdtfSfnRActINcHLST+qR0kzWFvOTi74OulA02rbUJ6MiH9IWpE7ph8EWq5YGV2YqR0Rr8v327STPjufdNb0BeC4wvbHosLCcxHxtKTFksZHxO+rFiJ/hx8HZlVNmxU7g+fTPwKoamdwJ6OY5pP+fwWMBx7OjzcmLc1R+u+Umxs/QP8aSleQlgj5e9ksIkd7eOb7rRx7Jb2+WIYoOd9hoFET9CV9PCK+NNj48qi22tyVkj4JPEvS/qS1a37WpaJWcRTpWpfbSVpGqtG9s2TarkxRj4h2D5xFFwJ/Jo32+L8qCSW9ijSZq2/AgWxDqvezzMvtt6eTfvSPk4bnlRIRX8/9ChMo/DaiRJ9AN8ZTR5oo9IikTwEPRMRTkvYBXiHpnCgMBS1hHHBbHqFWHL5adoRa28Nfu3EQzZqNYjq9TMLGwVfS6cBPIuLi/Pwg4A0Vy3EqaWj2t/LzI/K295ZMf7ekD+c0kOLN3VUKIOkk0iikxpLux0jaMyKOr5IPjK7JWYdExM80yCStqDY5awxpBNABpKP/JcAZMUxfRq7BjIn2hkt2+t4b0T8TFlI/wwlRYqZiIY9bI2KHNt9/b1LT0vuB/ym89Bjws4gota5Sk3wnkNrZF1VI0/ZM7TyWezAR1WaRLiSNqppAWuPpQuBlEXFwhTz2HqQgV5ZM36yzNCKi7PDXRj6ddKx/mHTmuxv5dxoRc4ZOtUoeqyxT3WxbizxujpXXt2q6bYj0zwO+RjprDtKEzo9ExIMVyrAI2CnyfJg8BPSmqDhhD0ZR0F8T5VrpkaxasywTZDpaL72Qz49Ik9UaB80jgB2jwqxHSacBX4+Sa+gPksfWkRaia1uno2/UhZna3aB8nYXcxPJkPgPp6XUWukFp+YJ9SEH/YtLKqVdHxGEl038eOJx0BjmTFPQr/W0kXUK6xkNjDf93kNbvKb2onaQFpCGWv83PX0gaINCy6TEH53Mi4h1Vyt0kn0XAPo0zrTyi6Ip2gv6oad5pkDSH9AcoTnO/oMwfUWla96D/NO18gR26mLRk6i1UX3Wv6lrzg9k2It5ceP7ZXNOsYi/gXe0MX5X03xHxEeAbjbbbojLNEereLNKOZ2p3of0X0mSeKaQKwSF529ol3//qiNhL0mOs/L9eaaKZpCObbS9bS88OI/UZ3RQR0yRtRn/wbSkiPiXp06Qz8mmk/5FZwJmNAFzCFNKZ7E/y8yvztio+BlyutNyISCP1ppVJmPtXtpa0TnRwwSXShLIFSheREun/67ghUwxi1AV9oC9Wneb+vJJpX5fvGzM2z83372R4OnLXiwpX/Cqq0pzVwpOS9oqIqwEk7UlaCreKgzp4/8bf4Msd5NGtWaTdmKndafsvpIDyfuDEiLhH0jb0f09DirwIYHS+LHFxXZj1SGPMF5BWnCyrrY71otyR+wBp+O0KUl/FDyXNiYiPl0j/EGl0VqPW/ZxIFzQpJafZkTSsuzEDf3FEVJlZ3Ljg0mxW7h8pO9gCUuyaSeqQvhf4RKzpC6415EkOb2yMSlCaZPWTMqdahTxWOVVWk0sXrm6SPkrqbLyIlYNMy86yRg1Z6QpcbdWQc0lrtxYAAA3USURBVD47kZp2GpNNHgbeFemKT6NKpx2HnbaD5zw6av8dqXJT5AURMblCmm+RrjlxOPBvpP/1hRFRqpasdJnBI0lr4J8B/DQi/p775O6KiJbXZlZakuT9pD6auaSzv69GxH9V+Bw3RsRuZfdvkn7gyDig9eqcA/J4DWkF238i9TvdRLos6Vcrl2cUBv3JpBEvV5JOc/4JmB4Rl1TIYyFwVOS1LPKIjW9FydX/ukXSUaRp2X+mP3CX6iyTtEtEzM8jLAbOSN4gKg7nyjUxqtSCuimfYXyGdOq8Fv3NEWW+i30j4ldqMoMUqs0i7VSH7b+zIuKtgzVDDkPz4zNys9Wt0eZ6U212rH+WNCFslb4eSS+NErNaJS2MtELnO0jDf48D5lf5LiWdQjp7GziSqdIKl5LWz+ker5KukH4s6QzsNaQD2ZMRsV3lfEZb0AeQtClplhukGbpVp3bvQjpV2ogUXB4mXcey8jKlnchthLtVLf+APBaQlq69NT+fQhoZsPvQKZ9JvxmpvfAFEXGQpO1Jizud2W6Z2qF0haWPkvoqGiNniIiWVyiS9NmImCHprCYvR5Rc1nhAO/g6pB/6X8q2g+c89iPNWWgMyZtAuhjKUKN7Gmk3j4j789nrKjrt6K5iwBnkGFJn7KyIKN2O3GnHejdIuo10xbTzgW9EWoKl0pmX+kdmNb6PRoWk1IisPILpXNIcFkhnLkdGhYu1S7qMNAHzOlLH9NVRYfRP0ahp05e0XaSlVRs1psZ48PFKk1BKB+yImA/sqHyR5KgwPLHLlpBmsXbiMFIb59tJZz1HsvKqna18hxSk/iM//w2pRtPToA88EhG/aCdhRMzI96WaDYbI55l2cEkCDqW/clHWNaQZzvuRzuAuoeRcgYi4P9/3LLgPodjHsgL4XUQsLZOwix3r3fBtUhv4zcBV+YBa9Wz2IvonepEfPyppp4goM+jhNNIVti4HUJp7cTppfkpZi0hr/uxAmpj6Z0nXRUTV/rfRU9OXdFqkq+80qzGVOupKemdEfFeDzGat2LHSMUk/AV5GmslabNOvMtEMpYXbfkqaafjGKv8IkuZGxK7Ffo7GKXGVMnRKafLJWNKy0sXvouXBfLC/ZyGPtv+uzfp/Wuw/ixRUGpNo3k66TN5bKuTxJtKlJ59HCjTtLPHckdx5fH9E/DU/fxZpYcB7S6Q9hv6O9WXk8pPmXpwWEd9cXeUuQ9JaEVF6QcDcLzCJtFKmSJ2qi0hncT+IiCGv5dzNfh6l6+u+izSi6PkRsW7VPEZNTZ/+FQffExGVZrMVNKZxd2U2axf8NN8qa9LuuwkpaN4gqUr771+ULgbemOq+B6km0WuN5qhd8n0jUJQ5he7WBbSLfQJjSD/00hfyznaIiO0Lzy+XVGmVTeBLdHgZzS74ASvXRJ/O21pe7Sn6Z+T+P+C/I+LRPPRyZyrMkO6GwZovqXYmuyVpHarHc54zgJ+Thk3OJ/29hnJ3/vzF0YJVZ+QeTTqT34V05jKT1MxT2WgK+seT/ul+SLX1WJ4REY0LjZTuNV+dorNhl69rvUspx5JqMC+UdA3pItylJs902RVNtpU6De3i3/OQwuMVpB/XoRXzWCBpj4i4HkDS7lS/StkfhjngA6wVhXHlkdaqKnu1p4bDIl0Tdi/SwfvLpOGrpfqbuuQ7dN58+TxWvvjL30lnPU9KGnTopqRzI+IIUnCeQP/Fka4iLSlRxXqkhRDnVzlLaWY0Bf2HJF1KCk6zB75YdogigKQvkS5w8STwv6S14z8aEaUnjnRDJyNWutjueztp4soTpNPvn5J+GL1WHNGwHumgVirwqUvrMnXaJ5DtAlwrqbHQ2XhgcePMrOQZWMeX0eyC5ZJeHxGzASQdSuqArKLRIf9a4PSI+LnSLNte2jQiZkk6HiAiVkh6ulWiAc4jnUFfmJ8fApyvtHzKUGdxuyhd1H4qacRN4+wV+vsHSomITuaxrGQ0temvQ6rhn0uTiS5RbSx1YxjXG0nB5VjSmNeejqXuZMRKF8vQcRv06iBpXdK0+31K7Ftcl6lZ0C+71suWpJU9Gxer/zXpwh+lOjBzHk1H3hTKUuZCJh2NQuoGpdVfzyN1vAbpqk9HRsSSCnlcRGrT35/0232SdMWpnv3OlGawvpm0xPbOufnyixHRdE7GEPlMov//4pqIaHn2prR20AdIS48vK75EG+sYdU0M05rOVW/Aufn+413I69Z8fwYwOT/u+aX5yJcIHObv9fYy24ahXOOAJRXT7Eo6a7mJtLTFLVS4+Aip32ga6axrLVKHWcv1+NfkG+lC9eu3mfbZwJtIV74C2Bw4oMfl35k0ouqRfP8b0sXre1mGU4f771i8jaaa/u3Av5DWG9+HAadHUWG98TxS5A2kmsdupDW2L4qSY9u7pZMRK10sw3dJ45eLbdBHRUTTtVdWYzmKHdNjSX0LJ0RE6WUUJC0mXSR+pbWMomRTWLNRS70cydStZqoulWVEzN/oBqW1619CihmLo9o6SGuc0RT0u3qqpLRK3SORFkRqXHWqrbUs2tXJ8NMuluEO0g9ipTZoUkdmRI9mgQ5oFllB6sys1GGlvNhYB2W4jNTp9728aQppYlVPJhN1q5mqS2X5BbkDNCJ2zIHzpqiwJPFIoO4sHLdGGTVBv0HSqaR11xurGF4VFdeJUboQ+rHA+Ehj/ycCL4k2r0TTDqUp1R+OiFN69Z6DlKPjNuiRQmk27BTSeuWVO0Dzd/F10pC+AK4FPhQR93W/tEOWY1fSmjUT6B9s0bMDcC7DiJi/0al81tTwzMJxUXJ55zXRaBq903AnaXnWH5Nq+edKOj2qLbR1FqnztDEOeRlpOGjPgn4+w5gCDGvQH01BvYRpwHak5RMazTtB/1C5Vk4ApkbEw/DM2eCXqT68rlPfpUkzVY+NlPkbHYmIDxWfKy8cN0zFGRFGY01/Ealt8S/5+XOA66rUgiTNi4hJA2oxPV8JUV1ayMkSSYujzQXBcvpmq6/2/OIlnTZTdakMO5POenYgXWegjzTuvvSCaSOR0sJxt0XEi4e7LMNlNNb0RWF4Y35cacwr8Lc8rbxRi9mWlSdf9ErjVLl4AfGys1BtVddK2j4iqs6AbRgjadyAmv5w/EZmSDqDNpupumRb0nUStiINedydURgvBszpeWbhuGEqzogw6v6IpKaZG5TWrYE0CqfqiIIZpElZW0k6jzT+9l1dK2FJ0Z2Lklu/PYCFauMKXtnJwHWSfpCfv4W09HWvddpM1Q2fjogfKC2Y9hqGZzZtNzyf1FQGaYDA74Gjh684w2/UNe/AM6eejdPfX0fETRXSjiEtM3AZKUiINpZn7oY1aVjcSDBYp3SVfov8N2icaf2qg7OGtnXaTNWlMtwUEa+U9AXglog4fziaujqlJhdHkrSol53iI82oDPqdarTpj4ByrBHD4qy78ozc/xqOA06hDMM+m7YTkj4AfJA0xLt4Pd0NSDNq3zksBRsB6hr0TyKtIzKwA7X0BK8ulWONGBZn3ZXnTmwLtNtM1Y0yPBuYTKrl3yVpc+DlEXFpr8rQCaVrZYwDvsDKFxB/rNe/85GmrkH/HppPfunpWhjdWhfE1izdaKYyG0xdg/6zSKd+e5GC/6+B/4k2rkLTYTl2Ab7GGjYszsxGrroG/WYrS24UEW8dhrJ4XRAz65m6Bv3bY+WrGzXd1oNyLCLNDvx+RPy21f5mZp0aM9wFGCYLcvs50PbVjbrhENLY4VmS5kr6mKTxw1AOM6uJutb0R8TKkgPKNBH4NPCOiBjb6/c3s3oYjTNyu2HycBegIY/UeFu+PQ18fHhLZGZrslrW9EcKSTeQptr/gNSuf/cwF8nM1nAO+sNI0ksiYvFwl8PM6qOuHbkjxQOSviJpXr6dnGcSmpmtFg76w2sm8Bjw1nx7lLQWj5nZauHmnWE03BfiNrP6cU1/eD0p6ZkrJEnak7SaoZnZauGa/jCStCNwDtBox3+YdI1Wr71jZquFg/4wknRsfrh+vn+cdPHp+RGxcHhKZWZrMjfvDK9JwPuBDUm1/feRJo6dLsmTtMys61zTH0aSrgIOjojH8/P1gZ+TAv/8Xi8AZ2ZrPtf0h9fzSFdGavg7sFle1/+p5knMzNpX17V3RorzgBskXZifHwKcL+k5wLBdH9XM1lxu3hlmkiYBe+an10TEcCzxbGY14aBvZlYjbtM3M6sRB30zsxpx0DcDJJ0i6SOF55dIOqPw/OTCZLoq+e4j6aJuldOsUw76Zsk1wKsBJI0BNgVeVnj91cC1rTKR5Etd2ojmoG+WXAu8Kj9+GXAr8JikcZLWBV4KbCTpJkm3SJqZtyPpXklflLQAeIukyZLuzM/fNBwfxmwwDvpmQET8H7BC0nhSrf464AbSgWAScBdwBvC2iHg5aY7LBwpZ/CkidgZ+CpxOmnOxC/D8nn0IsxIc9M36XUsK+I2gf13h+VLgnoj4Td73bOCfC2m/n++3y/vdFWk89Hd7UXCzshz0zfo12vVfTmreuZ5U0381cEWLtH9ZrSUz6xIHfbN+1wKvAx6KiKcj4iFgY1Lg/xEwQdKL8r5HAFc2yePOvN+2+fmU1Vxms0oc9M363UIatXP9gG2PRMRSYBrwA0m3AP8A/mdgBhHxV2A68PPckfvgai+1WQVehsHMrEZc0zczqxEHfTOzGnHQNzOrEQd9M7MacdA3M6sRB30zsxpx0DczqxEHfTOzGvn/3r1krKmaOs0AAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "## The distribution of top bigrams after removing stop words\n", "\n", "\n" ], "metadata": { "id": "FCniAY7Qhp3O" } }, { "cell_type": "code", "source": [ "def get_top_n_bigram(corpus, n=None):\n", " vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)\n", " bag_of_words = vec.transform(corpus)\n", " sum_words = bag_of_words.sum(axis=0) \n", " words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]\n", " words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)\n", " return words_freq[:n]\n" ], "metadata": { "id": "QFwx-V00gzfh" }, "execution_count": 35, "outputs": [] }, { "cell_type": "code", "source": [ "common_words = get_top_n_bigram(df_sentences['sentences'], 25)\n", "for word, freq in common_words:\n", " print(word, freq)\n", "df1 = pd.DataFrame(common_words, columns = ['Word' , 'count'])\n", "df1.groupby('Word').sum()['count'].sort_values(ascending=False).plot.bar()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 806 }, "id": "oMMy6ktQiUSO", "outputId": "ed5c40d1-81ff-452d-982c-b8b39caa73d7" }, "execution_count": 36, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "united states 1385\n", "prime minister 1082\n", "united nations 589\n", "president bush 500\n", "bird flu 415\n", "human rights 409\n", "european union 350\n", "news agency 323\n", "north korea 317\n", "mr bush 310\n", "security council 308\n", "security forces 302\n", "white house 293\n", "gaza strip 276\n", "foreign minister 266\n", "people killed 252\n", "new york 246\n", "west bank 242\n", "nuclear weapons 238\n", "nuclear program 206\n", "militant group 190\n", "middle east 185\n", "secretary state 185\n", "roadside bomb 181\n", "foreign ministry 179\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 36 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAFRCAYAAACYF30cAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO2debytY/n/359zjiF8TTmVTEdoFMXJrEiKFA0oSZLSoCL1Fd9+Rfk2KA00kAxRGkxFKCTzfM6RmZyQIeUk5BvF0fX747rX2c9e+1nPtNZeZ2/P9X691mvv9aznXve91rqf67nva5SZEQRBELSDKQt7AEEQBMHwCKEfBEHQIkLoB0EQtIgQ+kEQBC0ihH4QBEGLmLawB1DECiusYDNmzFjYwwiCIJhUzJ49+29mNj3vtQkt9GfMmMGsWbMW9jCCIAgmFZL+1Ou1UO8EQRC0iBD6QRAELSKEfhAEQYsoFfqSjpP0oKSbcl77pCSTtEJ6LklHSJor6QZJ62XO3V3SHemx+2A/RhAEQVCFKiv9HwLbdB+UtArweuCezOFtgbXSYy/gyHTu8sBBwIbABsBBkpbrZ+BBEARBfUqFvpldAvw956VvAvsD2YxtOwAnmnMVsKykFYE3AOeb2d/N7GHgfHJuJEEQBMH40kinL2kH4H4zu77rpZWAezPP70vHeh0PgiAIhkhtP31JSwD/g6t2Bo6kvXDVEKuuuup4dBEEQdBamqz01wBWB66XdDewMjBH0vOA+4FVMueunI71Oj4GMzvazGaa2czp03MDyoIgCIKG1F7pm9mNwHM6z5Pgn2lmf5N0JvBRST/DjbaPmtkDks4FvpQx3r4eOLBu3zMOODv3+N1f2a7uWwVBELSSKi6bPwWuBF4k6T5Jexacfg5wJzAX+AHwEQAz+ztwCHBtenwhHQuCIAiGSOlK38x2KXl9RuZ/A/bucd5xwHE1xxcEQRAMkIjIDYIgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBGlQl/ScZIelHRT5tjXJN0m6QZJv5C0bOa1AyXNlXS7pDdkjm+Tjs2VdMDgP0oQBEFQRpWV/g+BbbqOnQ+sbWbrAH8ADgSQ9FLgncDLUpvvSZoqaSrwXWBb4KXALuncIAiCYIiUCn0zuwT4e9ex88xsfnp6FbBy+n8H4Gdm9m8zuwuYC2yQHnPN7E4zexL4WTo3CIIgGCKD0Om/D/h1+n8l4N7Ma/elY72OB0EQBEOkL6Ev6TPAfOCkwQwHJO0laZakWfPmzRvU2wZBEAT0IfQlvRd4E7CrmVk6fD+wSua0ldOxXsfHYGZHm9lMM5s5ffr0psMLgiAIcmgk9CVtA+wPbG9mj2deOhN4p6TFJK0OrAVcA1wLrCVpdUmL4sbeM/sbehAEQVCXaWUnSPopsAWwgqT7gINwb53FgPMlAVxlZh8ys5slnQzcgqt99jazp9P7fBQ4F5gKHGdmN4/D5wmCIAgKKBX6ZrZLzuFjC87/IvDFnOPnAOfUGl0QBEEwUCIiNwiCoEWE0A+CIGgRIfSDIAhaRAj9IAiCFhFCPwiCoEWE0A+CIGgRIfSDIAhaRAj9IAiCFhFCPwiCoEWE0A+CIGgRIfSDIAhaRAj9IAiCFlGacG2yM+OAs3u+dvdXthviSIIgCBY+sdIPgiBoESH0gyAIWkQI/SAIghYRQj8IgqBFhNAPgiBoESH0gyAIWkQI/SAIghYRQj8IgqBFlAp9ScdJelDSTZljy0s6X9Id6e9y6bgkHSFprqQbJK2XabN7Ov8OSbuPz8cJgiAIiqiy0v8hsE3XsQOAC8xsLeCC9BxgW2Ct9NgLOBL8JgEcBGwIbAAc1LlRBEEQBMOjVOib2SXA37sO7wCckP4/AXhL5viJ5lwFLCtpReANwPlm9nczexg4n7E3kiAIgmCcaarTf66ZPZD+/wvw3PT/SsC9mfPuS8d6HR+DpL0kzZI0a968eQ2HFwRBEOTRtyHXzAywAYyl835Hm9lMM5s5ffr0Qb1tEARBQHOh/9ektiH9fTAdvx9YJXPeyulYr+NBEATBEGkq9M8EOh44uwNnZI6/J3nxbAQ8mtRA5wKvl7RcMuC+Ph0LgiAIhkhpPn1JPwW2AFaQdB/uhfMV4GRJewJ/AnZOp58DvBGYCzwO7AFgZn+XdAhwbTrvC2bWbRwOgiAIxplSoW9mu/R4aauccw3Yu8f7HAccV2t0QRAEwUCJiNwgCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIW0ZfQl/QJSTdLuknSTyUtLml1SVdLmivp55IWTeculp7PTa/PGMQHCIIgCKrTWOhLWgn4ODDTzNYGpgLvBA4FvmlmawIPA3umJnsCD6fj30znBUEQBEOkX/XONOBZkqYBSwAPAK8FTk2vnwC8Jf2/Q3pOen0rSeqz/yAIgqAGjYW+md0PHAbcgwv7R4HZwCNmNj+ddh+wUvp/JeDe1HZ+Ov/Z3e8raS9JsyTNmjdvXtPhBUEQBDn0o95ZDl+9rw48H1gS2KbfAZnZ0WY208xmTp8+vd+3C4IgCDL0o955HXCXmc0zs6eA04FNgWWTugdgZeD+9P/9wCoA6fVlgIf66D8IgiCoST9C/x5gI0lLJN38VsAtwIXAjumc3YEz0v9npuek139nZtZH/0EQBEFN+tHpX40bZOcAN6b3Ohr4NLCfpLm4zv7Y1ORY4Nnp+H7AAX2MOwiCIGjAtPJTemNmBwEHdR2+E9gg59x/ATv1018QBEHQHxGRGwRB0CJC6AdBELSIEPpBEAQtIoR+EARBiwihHwRB0CJC6AdBELSIEPpBEAQtIoR+EARBiwihHwRB0CJC6AdBELSIEPpBEAQtIoR+EARBiwihHwRB0CJC6AdBELSIEPpBEAQtIoR+EARBiwihHwRB0CJC6AdBELSIEPpBEAQtIoR+EARBiwihHwRB0CL6EvqSlpV0qqTbJN0qaWNJy0s6X9Id6e9y6VxJOkLSXEk3SFpvMB8hCIIgqEq/K/3Dgd+Y2YuBdYFbgQOAC8xsLeCC9BxgW2Ct9NgLOLLPvoMgCIKaNBb6kpYBXg0cC2BmT5rZI8AOwAnptBOAt6T/dwBONOcqYFlJKzYeeRAEQVCbflb6qwPzgOMlXSfpGElLAs81swfSOX8Bnpv+Xwm4N9P+vnRsFJL2kjRL0qx58+b1MbwgCIKgm36E/jRgPeBIM3sl8E9GVDkAmJkBVudNzexoM5tpZjOnT5/ex/CCIAiCbqb10fY+4D4zuzo9PxUX+n+VtKKZPZDUNw+m1+8HVsm0Xzkdm5DMOODs3ON3f2W7IY8kCIJgcDRe6ZvZX4B7Jb0oHdoKuAU4E9g9HdsdOCP9fybwnuTFsxHwaEYNFARBEAyBflb6AB8DTpK0KHAnsAd+IzlZ0p7An4Cd07nnAG8E5gKPp3ODIAiCIdKX0Dez3wMzc17aKudcA/bup78gCIKgPyIiNwiCoEWE0A+CIGgRIfSDIAhaRAj9IAiCFhFCPwiCoEWE0A+CIGgRIfSDIAhaRAj9IAiCFhFCPwiCoEX0m4YhyNArSRtEorYgCCYGsdIPgiBoESH0gyAIWkQI/SAIghYRQj8IgqBFhNAPgiBoESH0gyAIWkQI/SAIghYRQj8IgqBFhNAPgiBoESH0gyAIWkQI/SAIghbRt9CXNFXSdZLOSs9Xl3S1pLmSfi5p0XR8sfR8bnp9Rr99B0EQBPUYRMK1fYBbgaXT80OBb5rZzyQdBewJHJn+Pmxma0p6ZzrvHQPof1ITSdqCIBgmfQl9SSsD2wFfBPaTJOC1wLvSKScAB+NCf4f0P8CpwHckycysnzG0lV43i7hRBEFQRL/qnW8B+wP/Sc+fDTxiZvPT8/uAldL/KwH3AqTXH03nj0LSXpJmSZo1b968PocXBEEQZGks9CW9CXjQzGYPcDyY2dFmNtPMZk6fPn2Qbx0EQdB6+lHvbApsL+mNwOK4Tv9wYFlJ09JqfmXg/nT+/cAqwH2SpgHLAA/10X8QBEFQk8YrfTM70MxWNrMZwDuB35nZrsCFwI7ptN2BM9L/Z6bnpNd/F/r8IAiC4TIefvqfxo26c3Gd/bHp+LHAs9Px/YADxqHvIAiCoICB1Mg1s4uAi9L/dwIb5JzzL2CnQfQXBEEQNCMicoMgCFpECP0gCIIWMRD1TjA5iOjfIAhipR8EQdAiQugHQRC0iBD6QRAELSJ0+kEpkdwtCJ45hNAPxoUwGgfBxCSEfjBhaHqjiJ1IEFQnhH7QSmInErSVEPpBUIPYVQSTnfDeCYIgaBEh9IMgCFpEqHeCYJwJ+0EwkQihHwQTkLhRBONFqHeCIAhaRKz0g+AZRHgXBWWE0A+ClhOqpHYRQj8IgtoMOnq6rF0wOEKnHwRB0CJC6AdBELSIxkJf0iqSLpR0i6SbJe2Tji8v6XxJd6S/y6XjknSEpLmSbpC03qA+RBAEQVCNflb684FPmtlLgY2AvSW9FDgAuMDM1gIuSM8BtgXWSo+9gCP76DsIgiBoQGNDrpk9ADyQ/n9M0q3ASsAOwBbptBOAi4BPp+MnmpkBV0laVtKK6X2CIAhyCePvYBmITl/SDOCVwNXAczOC/C/Ac9P/KwH3Zprdl451v9dekmZJmjVv3rxBDC8IgiBI9C30JS0FnAbsa2b/yL6WVvVW5/3M7Ggzm2lmM6dPn97v8IIgCIIMffnpS1oEF/gnmdnp6fBfO2obSSsCD6bj9wOrZJqvnI4FQRAMnCbRyU1USZNN/dRY6EsScCxwq5l9I/PSmcDuwFfS3zMyxz8q6WfAhsCjoc8PgqCtLKwbTD8r/U2B3YAbJf0+HfsfXNifLGlP4E/Azum1c4A3AnOBx4E9+ug7CIIgaEA/3juXAerx8lY55xuwd9P+giAIgv6JiNwgCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIWMXShL2kbSbdLmivpgGH3HwRB0GaGKvQlTQW+C2wLvBTYRdJLhzmGIAiCNjPslf4GwFwzu9PMngR+Buww5DEEQRC0FpnZ8DqTdgS2MbP3p+e7ARua2Ucz5+wF7JWevgi4vcfbrQD8rcEwmrQbVpth9jXRxzfMvib6+IbZ10Qf3zD7mujjK2q3mplNz21hZkN7ADsCx2Se7wZ8p+F7zRpWu2G1ifHFd7Gw+5ro44vvov92w1bv3A+sknm+cjoWBEEQDIFhC/1rgbUkrS5pUeCdwJlDHkMQBEFrmTbMzsxsvqSPAucCU4HjzOzmhm939BDbDavNMPua6OMbZl8TfXzD7Guij2+YfU308TVqN1RDbhAEQbBwiYjcIAiCFhFCPwiCoEVMSqEvaYqkpSuet8kwxvRMJUVRBxkkLTHO7z9V0ifGs49MX8vnHFt9GH0/U5C0T5VjE4VJo9OX9BPgQ8DTuBfQ0sDhZva1knbXmdkra/a1BnCfmf1b0hbAOsCJZvZIQZvpwAeAGWQM5Gb2vpK+jgfG/AhF7SQtCTxhZv+R9ELgxcCvzeypgjaLAW/PGd8XSsZ3J3AacLyZ3VJ0ble77YCXAYtX7Su1WwlYrWuMl+Sc9ytyvrdMm+0L+vgq8L/AE8Bv8N/3E2b245KxbQIcAyxlZqtKWhf4oJl9pKDN4sBHgM3SeC8DjjSzf5X0dY2ZbVB0Tubcfr6Ly4Ftzewf6flLgZPNbO2SPvc0s2O7jn3FzHrm02oybzNtl8PdvbPzYk7B+S8ADgc2Bv4DXIn/xncWtFkC+CSwqpl9QNJawIvM7KySsc0xs/W6jhXKnT7mxWzgOOAnZvZw0bm9GKr3Tp+81Mz+IWlX4NfAAcBsoFDoAxdIejtwulW/w50GzJS0Jm4dPwP4CfDGgjZnAJcCv8VvTFXJTqjFgbcCfy5pcwmweboQzsNvgu8Adi0Z36P4d/bvGuNbF3etPUbSFHzC/awjJPKQdBSwBLAlLiR3BK4p60jSofjnuIWR79Dwz9vNYTU+QzevN7P9Jb0VuBt4W+qjUOgD3wTeQHIzNrPrJb26pM2JwGPAt9PzdwE/AnYqaXe5pO8APwf+2TnYQ9B1vou3Ac9j5HPsAvy1pJ8vAb9KN+kXpfEWzaMOb5f0LzM7CUDSd8nc4HvQZN4i6RDgvcAfGbm5GfDagmY/wfN8vTU9fyfwU2DDgjbH49fHxun5/cApjL5Gs+PaBf89V5eUdT1fGvh7QT/QfF68A9gDuFbSrDTm82rItuFG5PbzAG4GFsF/hNekY9dXaPcYfqd/CvhHev6PkjZz0t//Bj6W/r+upM3vB/Q5pwBXVBzfx4D9q/QP3DSAsb0GvxD+CZwArNnjvBu6/i4FXFrh/W8HFhvCXLop/T0GTwtSdS5d3T0XytoBt1Q5lnPOhTmP35W0GROdmXcs55y3AFcANwIvrPgdPgs4H7+xnIDvusva1J63mXmxaM3f+IacY2W/1aw6vy++I90C30W8JvNYD5g2HvMic+4UYPt0Pd4DfB5YvkrbybTS/z6+KrseuETSargQL8TM/qtBX0+lu/juwJvTsUVK2pwl6Y1mdk6D/rKsBTyn5BxJ2hhfIe2ZjpXp3q+Q9HIzu7HOYJJOfzt8dTED+DpwErA5cA7wwpxmT6S/j0t6PvAQsGKF7u7Ev+fSnYikk81sZ0k3Mlq1IcDMbJ2C5mdJui2N88NJNVe4rU7cm1Q8JmkRYB/g1pI2cyRtZGZXpXFvCMwq68jMtqwwnm6WlPQCSyqMpJtfMu9ESd9m9Pe2DL6S/qgkzOzjPdplbQDvB34JXA58XtLyZla0wm0ybwFuApYFHqxwbodfp9TtP8M/5zuAczrj7zHOJyU9K53fUfP2nItm9ifgT5Jex1i1Vdl11mhepHPXwa/HN+JaiZNwNdHvgFeUtk93jUmJpGlmNr/kHOGTbHUzO0TSKsCKZtZT3ZD0mh8CrjSzn6aLZ2czOzTn3MfwSSL8Avs3vqvoCJ9Cg3NXewP+AhxoZqcVtHk18CngcjM7NOkv9+11oaY2twBrAnelMVYRjh2d/oXAsWZ2RddrR+T1Kemz+LZ1K3yLbXjOpc/26KMjgFbC1UkXkLnYevSxopk9kG7+Y0gXZNHnWh541MyeTrrm/zKzv5S0WQHXE78O//7OA/Yxs4cK2tyKq03uSYdWxVeu8yn4/iU9F1e9PN/Mtk1zcmPr0qN3tdkGV0femca3Gm5zODfn3N2LPquZndCjj7sYPV81upm9oGB8tedtajcTV0/exOh5UWSruKvgLXPHKen1wGfwtO/nAZsCe5jZhSXjm40vgpbDb4DXAk+a2Ri1VWaRsghj58VtZlaYaj719QhwLHCamf0789rpZva2ovYwiYR+k4sgtTsSV++81sxe0tEnmtmrxn/U44OknczslLJjXa83FY5Lmdn/NRvpAgPy4mb2aME5jQRQars68IAlA1haqT3XzO4uaLMEsB9usNurqsGuCb2+9w69vn9Jv8b1tZ8xs3UlTcPVDi8v6W8xfKUJLkR6rlTTLu7EPOE0kZB0M77TvxG/lgEws4vHoa9nAxvhN7OrzKw082XHkCvpY8CzzOyrkn5vZmNW3U3nQ2o7BTjAzL5U+kFKOpkUD9x4uzNJx4YboW+s0K6jR6yjh90U11f+AV813QXcWaHNkun/dwPfwIVKr/PXK3pU+Uxlx3LOWRf4aHqsW/F7PwFYNvN8OTx9RlGbJYDPAj9Iz9cC3lTz914OWKfCebPI6HuBRYFrS9r8HNifEd3+ElTTLX8VN9Itgu9G5gHv7nHu0unv8nmPCn1dmzNvy+w2SwD/r873jnuN1NKXZ9pughsg39N5lJz/Qnwnch6uivgdJXaK7HdRc2zvyXuUtLmgyrGcc67Djb9XAS9Lx6rIpuVwz7FK131nvjf5rbKPyaTTX8HMTpZ0ICzI41PFS+aptKLp6Ommk1kt9OBY4BO4Jb+qJ86RwLpyN75P4kbCH+GGnTy+XvBeRo5ngqRtcT3eSpKOyLy0NK4u6Incb/gDwOnp0I8lHW1m3y5oBi54F7iqmtnDkspcYI+nhhdEZowX4capaan9g5IuN7P9CppNMy/I0xnfk/JkfkWsYWbvSHYbzOzxpAYso47Xz0+AN6XPMUYNAvRUgyT+mVadnXm7Ee59VUST7/1O3FPoTEZ7CX2jqCNJPwLWAH7PaE+rEwuanQIchV8bdTzcLpX0ZdxrKqve6emyCWR38ovjqsY5eeOTu08uAayQNAGd32ppXOVYxj7AgcAvzOzmpLYqUwk18UgC+K2kTzHWq6vMW2gBk0noN7kIAI4AfgE8R9IXcffBXN1yhkfN7Nc1xzffzEzSDniNgGMl7Vlw/pHpJrbA8FaBP+Mr2+3xi7vDY/hNqog98YI1/4QF7pFXMuIy1ospkpaz5BOcdOFl86apUF3G3C33/bja4SBJN5S0mSdpezM7M41vB8qLUdQy2GXofO7tgFPM7NFeH8vM3pQ+82vM7J7ck4rZDxdya8h96adT7s7X5Hv/Y3pMAeo4PczE3ajr6Ifnm9mRNc7v0FlkbJQ5Viggzexj2eeSlsWNunl8ENgXeD5+XXW+s38A3ykbnHkcySWZ53cChXYKXGuxRnbBUpF3pL97Z4dA+SIic3afW4VhPfDtz+W4oL8cV71UVVG8OH1JHwVeUuH8r+D+/xtTXeVyMX63/wPuKz2Fgi0eI2qnUrVMTttFGrS5Edetd54vXjS+zHnvAW4DDsEDmm4DditpcwXu0tf5jGsA11Qc44r49v9V6dgY17uuNmvg2+p7gHtT37mupJk2W6ffax7u+XA3sEXFeXEbvp1fBBfEV5d9pobzfTH8JvMyYO3UX6E7a9PvPZ27FB50VnV8p+AOEXU+08F4QNKK1FB1DeKRvr/bS875WMP3np7kxTlUVFvhXjfPadDX4lWOFT0mkyF3MXxL+CL8Tnw7MMUKDFWp3Y/MbLeyY12v523NzMx6riwkPQ/Xb15rZpdKWhUXJLnbXUnn43foV+FBXd2dFXkmbIpfQKvhgqHjiVPkObEf7oL6i3ToLcAPzexbvdpk2r4MD7QCn8yFkbmStsZ1y1kviPea2UUl7XbCd2GXmdlH0jb5a2b29gpjXArAKhqdmxjsUrus188SuO6+p9ePpBPwnd+1Vd4/0y4vynPMsa7Xa3/vktbG1ZAdV8y/4brvwpTn6Rp5BR50149HTeG8zbStFeGt0VHKU4GX4JHGPSOGU7u18e8v20+RygpJ5+Hqlk/hXn+7A/PM7NMFbWp7JKV2tefFmPeYREK/0YftPifp92+0Eteo8SbpndfDL7j3d79uBZ4Jch/zMTYHK3AdTO3Ww/15wYOlrqs41qnAcxkdAp+rskgeBjvihs7aQrUuapBeIunkf2fJoyht/bcws1+W9PWevONFQiH9VmsCf8J1sIWusmnxsBJuJ3gXo/XLR5nZi/PaZdrXuplJugL3ELowPd8C+JKZFeaskpRrqyqat01RjwhvM+upPu0a33zgT2Z2X0k/B+HBVi/FV+3b4guQHUvazTaz9SXd0PldJV1rBR6CdT2S+p0XWSa8Tj/zYZ+VDIjZD9sz8VUy+P5PatcJ4hLwJCWFByQtAxwEdELsLwa+YMVuhx1/e3APkkWA/zOzZfLON9flXSVpEzObVzSeHCrbHCQtba4nXx5XY9ydea0smIbkhnYQHs7/NCP+2blCyzxAZX8zOxk4u8oYM30dT808RDRLL3GQmXV2PJjZI+mCLxT61DAOZnhDxTFlz38vXkr064zM98fw+TyGdDPP8kD6u6qkVa3Y4LmkZfzQzewiedxCIU2Euzyg7cOMXFcXAd+38tw7m5jZOkmofl7S13FvvsLxyd28O7/ZHRWGuCPu4Xadme2R2pel5gCPywF4IO1I/szIzqkXj5vZESXnZKk9L3ox4Vf6ch/u9+KGo2zE2mO4euL0vHaZ9l82swNr9nkavu3q+IfvhtsPSgMfUnsBOwAblW0na46rc3HvjG9ZT6fEm0HSWeZGxU5QzYKXqLC1ljQXNwAX7iK62nwFVxPU8jCQ50jqsCAPkRUHnd1kJcnBctosWJFljt1oJT7wOe+zLJ6HaJuCc2qrF9M5b7eCAL2uc4s8RcrUkr/Ab1w/SofeDaxvZm/t1Sa12wh3AngJvsiZCvzTCoIRJR2DL4ay19XTZjZmp9vV7hoz20DSVbjH1EPAzWa2ZkGbnXE9+0X4XN8c+G8zO7VCP7PxXcVjwK0VdldvwlW0q+DfydLA5y05F/Ro8w382q3jkVRrXvSkiuJ/IjyAtzdsl+c/v1pJmzH+0HnHKvRdmK+nwftdWPAo9Xfuo8/CPCI5be7KeRTGOfR4nyp5iI4GXl7zfY9L82CN9PgGvoCoO74qxsE5Xc+nUi33zj648BCu0piDu4wO+vddDvdwm4Pvlr4FLFeh3SxcbXVd+kx7AF8uaTMmPibvWM45n8XTMLwdj1h/AN95F/ZFxlCKG1vL4nO+l/r5EL4zuA7PLjvQ7zz11egaHsS8mPDqnQ5mdlpdY04iz3/+RHr7zwM8IWkzM7sMFhhOnyg4H0nZXcAUfGdSJZ9LZaxZPpYFqGLa4i7uBC6SdDajVyQ9/bjNbFD52KvkIdoMeG/ayVRNL/ExXJD8HN/9nM9oF7hcehkHe5zbWL2YeJ+ZHS7pDcCz8VXxj3ADbXdfhR1kLikAACAASURBVDtQK94Nr2wlaRAK3neupKlm9jRwvKTrcA+2XjwtaQ0z+yNAMtQX+usnG9EF5rEip0k6i5II78QUM8vm6nmIkvohNpIi+yhJv8GN9GUuw8jz7RyJR4KvLc+Ns72Z/W9BX02v5crzoheTRuj3MuZUaFrXfx5c73hC0u0LT5P63pI2b878Px/Xne/Q62SNTXg1iqILUe6J082jwGwz+32PNnXSFme5Jz0WTY9SmupulZ+HqKcHRGLbKmPK9DEVOKvhRZdN51xoHDSzLwNfbqJe7Aw1/X0jHrNwc1Ib5vHmHsfBv8ciof+9ZAw/Hs/RXiX2BTyZ3qLA7+X1CR6gvCjTfwMXyvM5dXID7VHUwNxG9F2Sr765t14V281vJJ2Lp1OGlHCtqIGkC8xsq9TP3d3HCvgB/tm+n9reIK//0VPoN7Ebdpqmv1XmRf4bpC3DhKejh838XQovwLB5SbuL8UIZe+Bf8IP4Nq9Uf6tUncsKcsc3RSP5ZjbFvQV+np7vhG//P1TQ9if4TuJX6dCbgBtwD5ZTzOyrOW1ux6Nr6+TSz7Zfwswer3huI91tw3GtmnfcCgKiJF0AvK2GgMu2zRoHr+laTeadvymuGvynpHfjHluHW3nOo+NxB4bVcePiVOAiM1u/7pjLSCvVPfC5dw2u0ji/pM1quHF/UdyTbBnge2Y2t6TdYrjbNbhqrEpG1cPwQMJKNTGSEFwZ/52y3mq/6HF+JyL3Qtx7J+ss8hsr1+lfa2avUqZwinrk3sm0aWQ3HMi8GA991Xg8GMllfhUeObcYMLdCu+fh0Y2bp+er0iMHBymPSjp/zKOknxfgQngefmM5A3hBhfFdRUZnjgvLq0raXEImkAYPrLkYD8zJ1Rfj3g6Vg28y7TbGdwf3pOfr4hd3UZtGutt03ttwHfvXgbdUOP9G/IZ3I66HnY8b+YranIHvXo7F9dlHAEdU6Gtn3PXyBFxFeBewY0mbG3Ahsi6uI94buLhCX1PwG8Sy6fmz6ZGLqJ95m3mPqbjO/H48XfRt+I2xqM2iuBfXy6mQvwdXy+6H7zxOw6NgSwOLGKmJ8STVa2JUDorD9eQd9WAn19ZduF3goxXa/xq3DXWC4nbEF6RFbRrZDevMi16PSaPewXOgL4tb5OeQ0vWWNTIPnPlG5vk99Hax67iq5YWjl60wmlTqATekZSvtLJWOFfEcRm9xn8L1iU9I6rVyehzfipemLe7iW9SvFlVbd5vO+x5uHOxsyT8kaWsz66lvt64dW/Jw6lm+MHE6xSqPXnwGjxR+MPU1Ha+U1tMjhJrqRUkvNrPbGMmL/oIKu/eieVuIRnKzb4fbNt5sZnPkdRCupMf3lOxrR+EpHASsLumDVuxK3KhalDWriTFH0qusQlCcmR0OHC7pY1aeiyqPvXE7zYsl3Y/fMMoyl9ayGzacF/nUuUMszAeZEHR8lb8MBWHpeFAFpFVB5lFllbBplWNdr9eu1JPO2QNfPf4QX0HeBexe0uaz+I3voPSYBXwOv/hP6tFm97xHhfE1qRa1Fb6SvgjfgdwNbFmhr9tIKsf0fAruMld3rlRJL7Eont5gbSqmteh+X0pSbaRz6qbnODr9re3dQU5KA2D1CuPbDU8J3P1az3Qb6bdaM/N8DTyVc1FfTauI5WWiXYMCr7I0vvn4TamzEyxL6bETXlcBPLL5dCpkvsy079RlqHLuK/CdxN3p+r+OghV7P/NizHvVOXlhPmiYTni8+2Ikh8iheN3eGbiBan/KXdim4Olpn4cbfXcAnldxjDPxbek+wMxx/N5PTWOcg6uePoX7phe1WR2/Ma+THouRcumUtDuLjDtt+h5/VdImq8r4FL7jOrekzRbpQrsYV5XdBby6wvi+BpyLG/Xfi2/rDy1pU1m9OIDf6nJSSuf0/CUMoExmj76u7Xqu7mM5bX6Mx650nm+IGyPL+roKV+3MTo8n03z8Iz3cFdPcGfMo6adT3nMzfMGyHSW5lQbwPS6d/c2G8ZjwhlwNIPxYFdMIyEu5bYLrGr+ZeWlp4K1mtm5Om7sYmzo3001p8NMC40+Fz5GNrs3rrGfwk8YGZ3XalI2vSbWo2bjL2v3p+auB71oP43nGFXIZ3Ph2TXq+IW4s3aKgr4MyTzteU6dZKqpSML53mdnt6fkLgZ9aBWOYPIBs0/S0p3FwEMhLM85g9LwtSvmwHb7YGFXk3Hp4dDUcU8fQuDUuSE/Gf6udcLtPT9WamlcROx34rKV8QPICSl/AP+vpll+sJO8aecwKPMg616I8jfONZvaTOtdnHeTpMg7CbzCG1zX4QtF1lWlba150Mxl0+n2FH2t0GoFOjgsjP43AorhOfRqj9aP/wI0zY7D+fdIvSIKkimdCd472Dh0XxyIBPjPz/+L4RVoWKo557pa6lZU+BPxSHqm4PvBl3MWsF4cVvFaImX0eQPUSri3SEfipzR+Sm2mV/k7DjZDjihrkqzezs9PnOA+fv281sz8MeGhZ99C/MhLvMo9M/EwPekYul/BCyySAM7Nbko77zgK99hw8QvZh/PpYFviLpL8CHzCz2Tlt7pf0ffyGdmjyNCpzQ23Kz/BdZicKfVfcg+91RY2azIsx7zHRV/od1DD8WM3SCKxmJS51gyL5pi+Jr3b+xUhwUW44e3JHWyVvp9Kg79llq1uNLtbS4VG8gs8ZBe02xv2W/wVsZ/XzC1VC+VkidzezmwraHIcvADp5VXYFplpxjp/OKvdQ3JAuSn6rfkir4kr56jU25mMrXPVxN5TGfCwwuA8LSc9hdIBl4VyW9HPc0aGTD/8dwAq4LeIyy0lsJukHwKmW6gPL69++HY9HONzMxjhYyLOmboOv8u+QtCIe7V0Y+JTafRKvlPcBVSi/qZz0IaqQCqTOvOjFZFjpd1g5+c0/hgdDrIfXiyyLRLuXasVWsjwu6WuMjf4tq2pTG6vpmWBmJo+OrZsnJpuUqxMxXOX3XxyvR9Cpv/t2XAe+rqQtzWzfTB/ZiFVw3+dHgWMlYSVpYxtyNO6WeGEawxbpWFGWyA/jHhcdYXgpHoJfxldx75ZbG4+2Ojfh9oAHyk5kdE4qGF1gp4zjJK2MF/O+FLjEzG6s0b4ykrbHd+vPx92aV8PdQ19W0vS9uEdWZ65djttvnmIk5Xc3G5nZBzpPzOw8SYeZ2QfTCn4M5nEop2eeP0C17/946lcsO0/SOxmJ6N4RtxeVUWde5DKZhH7T8OPaaQTwwho/x1UpC/Jj9zH2MXRcsDQ2Q2JnfEWJlyq7o2XIlmecjwvunSu0Wwf3XHoaQF5o/lJcF9ktHBqrafqgcpZIjURXfsE813lhScAc/lpX4KdV35cZm6O9LIf8CsAtkkrz1VtB4fgyzOw18sjaV+EG7rMlLWVmpaq/BhyCp33+bdKdb4nnwyob4xNpN3Mevqi4PaOb76XOe0DSpxm9O/hrsu+VlUutS+WKZRoddb4vI4nupuKf5VMlfVWeF72YTEK/afhx7TQCwLPN/an3MU8he7GkQgGrnHDtvGMZ9gP2Ir9WrlFcK3NDYFdJlXK0Q1+5PpbD7Ryd3dKSuGvg0+qKCbA+cqmni/FEM6trP7hT0mcZnSWyV/nJFZMRbHtJP6PL+N7rRpsxXs5KqoZfMvqCK/L5Px63KX0TX5XuQTU98cEVzukbSZvhGSg3x/XeZ5FT1Cen3epmdlfZsS6eMrOHJE2RNMXMLpRUpYjPFrg78934b7aKpN2tOG/Uu/Dv/Zf49XR5OjaVaoudOlQuv1l3Z5/DwX22n1RCf7a8Qs3qwIGS/osKd+yOoa8mlfNjq2FRZTPbK/1tIozr5mjvh6/iQV0X4Z/t1cCX0mr6t4PqJN1EVpO0qNWrG/o+4PP4ttxwgdVLN/85PMZhZcau8otutFnj5ePA67vaFQn9Z5nZBZKU7EQHJ++hzxW06esGWpOLcNXEl4Fzanz3p+Eq1iyn4ob7XjySDO6XAidJepBM6u0Cvo67Zo7ytirqKzkgfKzHy7mpIpLTx48t1YOuwUF4qpdVJJ1EqlhW8z0qMYh5MZkMuVPwgIY7zYtePBtYyXpkwZP0LTPbN0fPDBRvh1QjP7akfRgpqnw/I0L/H8APzKywsHK6aXyEEdetS3FX1IFm6OyHZNDaID291sz+PE79nIj7lp/J6Dz8ddUwZf181swOGeR7FvR1Bf7bnorXTr0f+IqZvajH+ZeZ2WYaXZQHahiNVS9P0rK4kHo1ruL5D3ClmX22x/kvxnXwX8WTjHVYGs9X31M/nxYKT+A7nV1xF92TypwslF//YMyxfpH0v3gk/Rw8/fa5VQ2malh+s8bY+p4XCxpMFqFfF0nrm9lsDamsmxqGcEs6GTdOdzxJ3oXn1SgMTR8maQezFqN10mXZOZv0c1De8aLdmrzW8E7mqXc7Y/2ZmQ1zN9QTSa/CjZXL4jrtpfG6v1eNQ1+b4KlJljKzVeXpxD9oBb7zqd1LcNfLzXED+D1mlnvdyNNJvAXYnpSaI/EY/r1fUdLXasBaZvbb5PUy1cweK2lzPO6eWMvbqglJZfx6XA03Eze0Hpvn4dTLHtehxC630HjGCv1+kOdU+QBjAyDKXPpqB01IusW66vXmHRsEDcf3fjzqd2XcN3gjfCVYVI2pduH2pigneCbv2MJC0uZ4IZinM8fWqyIQ0g1sFUb/Xj3bSboa9wI500ayPRZWFpOnOb4NDw66BA+GK1XxSNrYzK4sO6+rzQdwO9byZrZGMnIfVWD36rRbDPe2WpAxE0/61zNDp6RNzezysmM92q6LC/1t8DQHGwHnm9n+XeddmNO8gxVdI6n9ZvgN8Pgkc5YqsYl02tWaF91MJp1+I5Kq5hDGCqCi7dAZ+MT6LRUShaV+mgZNzJG0UWflJ2lDxrrg9U0f49sH3/ZfZWZbpu39l0raHEtO4fYKY5yJJzXr/FY+yOJt/H/kdWDvSe+xGuXJ8YbJucC1knaykTTMxzBWHz4KSYfgeuE7GR1UWChIzOzeLv+Gsu9/TTNr4s3yVnlx7ydwffY6wCfMrKim7N64mvDqNNY75D77PUkG/uvNI+/rqPm+zdjvOO9Ytq99gPfgsR7H4Oqqp5Jq+Q48AngBfThHdHa1M/EI5ePxFCc/ZiTau1e7RvMiy4QX+uqRcqCDldRdxbNEvg0PuKgqDJYwd+mrw0xqBE1IuhH/sRYBrpB0T3q+Gr7yKmrbJEio1vgy/MvM/iUJSYuZu5nm6qMzVC7c3sVJuJ74Rqq71X0GuExeN6FTC3WvogbywtrHWSbKswryXPpfAp5vZtvK0wFsbGbHFjS7Hc/Zc7GkPZP6o4rX2c64K2Ado/a9aTdn8sjcfXDVUhFryt1wK1d9SrzezPaX9Fbcq+Zt+E6hSOj/28ye7NyUJE2j5AadDPy3Z2/sRWgklcp0jS42tDTuuVPEcng66VGBmeaFXN6U01c/FcveiheGmZPO/bPcOaWMJvNiFBNe6DOSckB4ro5sWPU9uDdPEffiSafqCLuzJL3RzAor7XRRN2hizCSqQZMgoaZBHfclY98vgfMlPYwnKxtDRsd5oTy4rbRwexfz8ozlRZjZb1K/G6VD+1Ywot0KHJ2EzvF43p0qAXw/TOd/Jj3/Ax7PUST0zczOkhex+bk8GrjKXLwJn+OFRVq6+BCeJ2kl3GB8HuVlIGtXfUp00lZshxfueVTlHtQXS+qUkNwad2D4VUkbcGF8s9w3PWvgz3PGqJ1KBRbsKN5pZgfnvd7jWut4dT0Hv9H8Lj3fEriCYq+uJ83MJHXcPHNjS3JoMi9GMWl0+vKw6l90BLGkbfEiGx8safcqXL1zMRWDszSSGuHfuPtm6Uo66fdegScLaxQ0URVJl5tZ4TYwp03f40tG8WXwakJjVhoD0HFuBewCdOf8b5L7vpS0Y9kj9Xk57m3V8zOoWYWk7LlL4V4hbzOzwgVXUnWdgV/k4zafmnymdM5XcIPuE7jKZlm8DGXP+hFJTbInbigVrvo6pmxBpgbOGGqQSkXSGcDHquwoutqdh6f+eCA9XxH4oRU4E0j6FO4csTXuLvs+vFxloTPIIObFZFjpd+gOq/61vDZnGV/EI90Wp2JwljULoDi4QZumNAkSOrjfTosusvT6lgCSXmBmowKk5IVUytgDT/mwCKP1lQMX+mll9+L0+Bue23w/eSGQd/Zo9k+5a15ndbYRJSk+sgZl82RwO6tHiccuTsBVeKWqLvVRbxn4mzyYqPOZdqTCbtDMDkjX36NJBfM4BTWhE1vifvA/KHv/rr4ulmfb3SCN81rz4khFLCbpaMY6LhQtPOrsKLKs0hH4ib/iWomemNlhabfzD1yv/zkrKVGZqDwvejGZVvrn4sbVrNvWq4vupqldoffCIFEDd7SG/Ryfc9hsHFzYmiBpjpmt13WsSnK3262H//ogkfRNXL32O9wd75oqY0hqpG/jhVduAqbj7qLXF/S1OL667c7jVOYJdq3lJBLrce7uRa9bQZqGdDPu5Cp6GE/P8W5LhcEL2i2BR5WvamZ7qVqSsRPw/DR/J+X5wROmFQZDyT3IPof/XsLdS79gZscVtLker+w1ypnA8rNrdto0cu+W9B181Z4twj7XzHoFhzWmzrzo+R6TSOgvz0j1eMMnzBesxJCbViO/tfLEbP2Or5E72rBIq9Jv48FPi+JGrX+WGH/r9tE4cCe1Px73Yb+lRp+1jbKS9gBONrMx0aCSluml35e7Dj6Nr8yEG2mnWLHr4Cm4Yf5deA74XfFqYPuUjPEb+C7uTOrZRZAnJrQ6C46kU55StU3aac7GC8KsnW4CV5SphVLb5+P69U/hRvEyVdftwCaWgrjSbuuKogVClUXGIElG3c3T00usdxH27uCqUZRdj/3MiwXvMVmEfgdJS+ZdrAXn19bPNxzX70nuaBndaGmq1IZ91V49SpqFRxuegnvyvAfPU35gSV9LAk+YezC8EFeH/NpyilGo/8CdW3G30rvw36s0p1BaBe6Bb+ErGWVVP09S55y8HcyYY12vdwpz3GBm68i9ai41s416tUnt8mwLhXaRpO89HjdgCngET1RYtLpdDM+cOoPRapAvlIxvlpnN7LIFXG85hYYybd6NC8aX4yq1y/DvotDfXx7VvEXHjiRPEHeRmfXMpCrpYNzY+QtGC8eiQkPjvjDK9HUIrkb7Ef5b7QqsaGaF6TmazItuJo1OX5loQ6BytGEd/bz6cw+t7Y7WBz/CV49vILN6LGtkZnMlTTUPFDpe0nV4/dYiLgE2lweEnIen4H0HOYVVzOwMSWcBnzazMl/+PGoX2TCzY4BjMkbZGyTlGmXVME+SRqq3PUvSK7vaLVEyxM7N8RF57v+/4N4eZZ+riQ/4ccBHzOzSNO7N8JtAUZzDGbhdYjY9koT1oHKSsQzfwvP8HwVcWKZCyjAXuDoZWg23Hdyg5JJp+U4ZHZVXdsdZVmjoO+QsjHqd3OeqffuuG+SRSSVVlpOpcWxAh0kj9PEshW8grSDN7Hp5Gb5B0o976MVq5o7WhDXNbCdJO5jZCXIXu7LMiI+nFdLvk8rrAaple5R5qtg98SjIr6ZdTS7JqPcWygO48to2KlxTwyj7QUbyJGW3w//AL/heZKu3ZQVMleptR6cbzGfxubsUJRd2HzzdEfgAZnaZpPklbVY2syYVrQ6mZpIxM1tB0stwFe0Xkwr0djPbraSvP6ZHh07xnp4LOmtY0a7OwqizoOy1ai/p6p+SdsVTPxvuQVZZg9EPk0a9I+lqM9uwznayj75qu4eqoTtaw/FdY2YbSLoEv7n8BQ+f77mKkRuZ/4pvWz+Bu15+z8xyMw5m2l2X+vgmsKd5SutCtVUylC6C+7BnvSAGnoukiVFWzfMkNareNizkaYqfhRsUDd+R/Yvk/JD3/cs9XL5tDQqnqGaSsWRr2JSRPD8rpHaFhuimpJ1Vdx2DohrDl+DlCo/Br6kHgPeWyZg8OVRB1TUDj6nYFBakft63xu6nMZNJ6J+Kr7K+g+eT3weYab3d6/rpa4xQGy/9fBOSHvs0fNt+PGn1aGZHlbR7Fu5tcXvReV1tXo0b3C43s0Pl3h77WnEJvr71jjXGV9koK+m1ZvY79YiktB4ur5LebWY/lvRJ8jO2FsV87Jdz+FFgtg2wYHnqq3achKRbgDWpYUdJ7X6Mx75camaFEeSZNjfgevzLcGPnfVXaNUGe5mALXOifA2yLewoVBWjlLYy+ayXlJJPN4buMXrXvXWRzWJhMJqG/An5nfB0+Mc8DPl6iZ++0rZXYSDXcQzWSTiGXsotnWEh6M17ZalEzW13SK3Dvp0IfZC2EGqp1qGOUlfR5MztINV1ek5ro+2qWBfQnuH64o+p7E3ADbjg9xcx6xpqoQYK8uiRBN4YyVZu86lWn+MoawHW4ID98kONrSrou1wWuM7N15Sk0fmxmWxe02ad7/HnHctrNoOKqXdL+SUWaG1tRtJjKvEdf82IyCf1GWfOUSWxkZi+Uu4udYgURrarhHpq5aDrh7tkKTmZmB5R/unrI0yK8h7E/fNHqezaelOkiq+FdJM9pU6uGqqRlGPn+wFeEX7BqqQ4qkTHKXoiv6LLG1d+YJ+jKazcF2NHMTs57fdAklcEbzQOzkEflno0brWdbj2yq6pEgr+Q3Hvfvvau/qXgyvi3xFBBP9Pre++zj42b2zZrtOirQ2Wl8j+Gusj3Hp3zvrIFmbJX0ZjP7lXrEVlhJ6csm86KbyWTIrZ01L1E7sVES7vuogntoZ0UkaeuuyfFpSXOAgQt9fLt6FfWi8p6ysflRSu/41qyG6nF48FKnLN1uuBqqMEFVTRoZZc1dT/dnpCB1ZdQs5fZzGO3V8hSe3OwJdZWb7KJJgrxhfO+A76ZwV+gr8cXAq2wki+jAMHcM2AW3KdVhVloc/QB30Pg/fKxjSO//LmB1SVlX46XxQLJcmqzazexX6e8C4Z4WIkuZ2T8qfK6miRMXMOGFvvrLmgcNEhupmXuosjuP9B5VvGOasLiZ5emKi7hZ0ruAqXKviY/jSaEKUbMaqmuY2dszzz+vAo+fJqQt9+ENjbK/lec+6TY0l6kKa6fcxjOHdtwNwZN0/STNw6IgtCYJ8sb9e89wA16ucG3cRvGIpCvN7Im8k5uu2BOXy6NeKzsGZK7VoyT9BljaelTZw6+DB3DDcrZm9WP45+xFx026dir0pPb7ED6PrgWWlnS4mX2tpGnTxIkjfU909Y48NHoL/AvKGiofA35lZneUtK+d2EjNilGsj6+0lsFVDQ/jgTHj4bHyCXzlchbVA0+WwLNDZr2LDrGSsoxyl79aNVQlXYlH4F6Wnm8KHGZmG5e1rUpTo2xqm2fPMSsp8qIKich6tJvJSJ70y82sVEioQYK8YXzvOX3+F+6q+SngeWa2WMG515jZBr1eL2jXJFCtaQBeNpXKs4BpNj6pVH5vZq+Qu22uh2sEZpfZAJvMizHvMdGFfgc1yJqXabs1GWFnJYmN1Id7aNKrMl561NTH3ngiuUcY2VaWCq3UtlaIvmrWUE1t1sWLs3RugH/HXd965qipS1OjbJ99/i8e/l8n5XbTvppklnwFnpBrmXToYQb8vWf6+ii++1sfz6d/Ke7J87uCNuPuytvU1pPaNq3sVbv4j7wAzSuAnwDfMU8qVypjmsyLbiaDeudbZrYv8J2OiiZLlTucmZ2fVu/T0nsuX7KVr1yMQiPufPt1He/0PdCi3olP4gFalYsvy1NMH0cKaJH0KCUh+gDmRejvxMuzrYyr2hYpaXM9sG66wVBRV1mLJPCn4CkhaunnJb2nx3vmekBoJPJSwP8kXfy4pfRIY6ldw9ncBXRcv/cMi+Mu1LPNrCwArENnl5RN8VCp6pOk7RibdiQvVUTTADxoUNkr0aT4z/fxm+X1wCVph1H6ezWZF91MeKHPiDfMYU0aS/og8Hk8SOU/pAuV4nDsvGIUvfT5HRtBk3TMTZkLPF6zzbHUD9FHo2uoHgnsUabiUVc+l8wNsDCfS136MMpmsxQuDmyFC4hcoW/NUm33hfJD/B/F9ceftK7U1alNk8pejTCz2tejNUwhIOkofPW+JW5r2xFXb+T10Y+tp2kqlSbFf44Ajsgc+pPcDbaQJvNizHtMFvVOUyTdgU/8OqvixkWVh4GkX+CrngsZrdcrcufLKyBemCwsnTPFatZQTYazTj6XbFrbr/ds1BB5MY+/Ud8om32PZfGEcIXpCOTucpdQIyCpKfLQ/vvw7b/wnDBr4DenD5vZFjltfk2q7GXumz4N91OfEEGFUGvFnm3TSVbX+bsUvsPbPOfcfmw9X8VVpu8BPoYv9G4xs8/0apPaVS7+060RyBlfoWagybzoZjKs9IEFRqmDGVvgvEyP/Ufqr4qbFFX+Kl5irk6h6Kb8Mj3qcLGk7zM6RP8ipRKHBXrVJjVUm+ZzacI70t9sWcCynVw3/4TSspvg6rHNgW/LE4yNZ0BSd0Kuo5Px79PyHE95rGBmJ0s6EMDM5kuqXJh+vKmzYu+i4xH0uDzO5iF657Z5DZ6S4805r5UV5Pk08H5cTfNB3DX6mArjq1P8p7NrfBG+4+zsEN5Mte+iybwYxaQR+rh64hN0rR4rcCBeePxqSlbF6s89tEmh6EZYSQBHDzoTpTuq9JUU61Wb1FC9QtLLrUE+l7pYg8Rakn7FyBZ5Ch6qX6oiMrML5cFW2YCkl+GqwEHzuKSdgVPT8x1xFSX0VjnUruw1ZDbJrNg/L6+F8OsK7c5Ku7Gv4Stao4cwNrOD0t896gxM7lJ6czL01qrshccoVCr+Yyl6O82j9ToOFfJU0GdXeIsm82IUk0noP2pmVSZIN9/H7/xVjCyNiionOt9lnULRjZC7HOYZtXuubpvqU4ElzOyars9SZrjbDHhvGmflfC5NqGuUTWT10fOBP1mFPDAaUkBSYlf8ZvI9/Le+Cni33I3woz3aFptkeAAAC/5JREFU7IevHNeQp5eeTvm8HSZ1VuwLMLND0r+nyVN3L97LO66p+sQ8COx2SatazRq5+CLnpVaj+A/wXCBrG3syHSujybwYxWQS+hdK+hq+ZapTMWYRqxjIlCzjF0v6odV3Dz1L0m34xP6wPHqz0Ae+D2Zm/l8c2AkorAWg5iH6TWqoblvy+iCpZZRN3AM8YClGQdKzJM2w8gyHtQKS+iEZ5PJUFOBG9bw2c5JL34LKXpZT7GYhUnnFnkUeY/JJPFngByStKmlzyy/N2I/RvWmN3I3wlOV1FjknAtck+xx48aHSHXyTedHNpDHkqmHmRklfwtUtv6IkkEnJPbRr+5/trCw52fKMFIpeEvgvKy/gPBBUUh5O0ml4NF9nYu0GrGtmhSH6alhDdWFRxSgrryK2iY2uxHS5Va9JWzkgqS7qIyGX3Ef9I/hOy/DdyFFWEoC3MEgeXj1X7F3nNi7NWHNMTWvkNk1atz7+W4Hbhq4rOLfvRG0dJs1Kvw/1xC7pb7YQQi9DX2P30DQRP4IXX9kL9xN+ER41O1A6xtfEFHzlX/ZbNgrRTyuL16lmDdWFSBWj7DTLuJ0mN71Fy95YYwOSjqM8JUVdGof246vHx3CnA/B8Mj/Cd4ILnZor9ixrmNk75DlyMC/qk6s77VM4jtkBUkHl0kAr0Gk3W9K9JE+mEtVSP/NiFJNG6EvKrTZU5u5Vx9BnKVCp7M7eg+Px1Ugnh/b9eNm1gQt9RucHmY8LoJ3zT13AE5I2s9Eh+qVqCQ3J574pDY2y8yRtb8m3Wl7bt4pLb5OApFpYV0IuSUuYWVXvs7VtdNbOC+X58icKnWukkxai6jVSpzRjP8LxFEauX3CHkVMYrUIcCJK2x6/j5+O1fFfF42Felnd+n/NiFJNG6DO6lNjieF7ynnVh1Z+/bhP30MqrkX5puOv5EHBi0u2Dq2qqVCxqWkN1WDQxyn4IOEnSd3FBch/um12INQhIakryJDuWekn/5kjayMyuSu+xIQNYGQ6QptfIQVQszdgtHGvSaAfYkENwW8BvzeyV8sCsd5c1ajgvRjFphL51BfZIOgxPGtaLfvx1m7iHNikU3Yju1XfneK/Vd3JH2808YKduiP4wfe6bUNsoa14UZiN5kA+Wct1PML5F/ZrQ6+OeJB0VwarA7UqFfsbDe6omja4R8zQqcxgpzbiPlZdmrJ0Ph+Y7wCY8ZWYPSZoiD4C8UF7usowm82IUk0bo57AEngsmF+sjNwvN3EMrr0YGQK3VdzIsb5b+r5uPZWg+9w2pvSXXENMV9IOZ3du1EC5bgEzkmzPUvEa6bFcw4jW2atJ/F3nuNcmH02gH2JBH0qLj0tTng1QsjN5gXoxi0gh9jS5LOBX3QS7T5zfNzVLLPTTdXJbDA7Iqr0b6oMnq+zp5gYhTGO2OVrTjgSH63DekyZb8h6R0Ben5H/A0DhNJ6FdO+tehqUFxWDRYsRel7ShL1NYkH84wd4A74C7d++K+98tQIs8StedFN5PJZTPrFjUf+GsVY5oa5GZp4h4qaZaZzez1+iCRdDTw7TqrbzVMQdzUHW1YSDof/y6yW/KPW0E6XEnXmtmrNDp1dqNc+eOF8mtC72NmDy3UgTUgZ8U+ipIVe9M+K+fDybQZ6g4w9dfZkV5jFQL9BjEvJo3Qb4oaFsxo0E/fib8q9NHZ7UzDC8PcycRcfQ+NpBc+Cc+IumBLbmZzC9pchNtEzjez9eTpCg41s1w/7WGTbDAnmtmuC3ssg6DHIqpD4WIqtd8bOMnMHknPlwN2MbPvFbT5MZ4P52Yy+XCKFjkaYsI6eSqFrwEX4dfv5ngBnFML2gxkXjzjhX4TmriHDuPm0mvVnems5+o7rfTz/JYHXmxkYVBnS55Wnt/GI2tvIqUrsN7l9IaOpMuA11qFSmXPdPJ2YSopWC7pdquYDyfTZmg7QEnXA1t3VvfyCP7fWnkRlb7nxaTR6TdFzaIUa7mHQrPEX3XpU6WS9YVeHC8Y/+f+RrTwabIlt4mfrgB8F3d5ssNkd47jUZRnKDRZsSemSpKlFWpa8ZbZbZrkwxlmwropXeqch6hWU7vvefGMX+lLOhmPUuxku3wXsKyZVY5STC6S51pBruqGN5eFRjI+X2Zmm5SePIFpsiWXtBNeOu8xSf8PT5n9v+OhW26KpO5sqMBIlsbJSJMVezrna7jr5ffToQ8C95rZJwva3Irnma/sgDDMHWD6TOvgqc7BU4TfYGafLmnX97xog9C/xUZHKeYeK3mP5YBrzWzNgnP6vrkME0kvAs4u+kyTgSZbco0U49gMD5I5DPicmW04pGFXZkieJEMh2aTW6Vqx32BmuVGomXZTcEHfMc6fDxxjZj1dFZs6IKRFw7juAOX+livjRtxO7p1LzewXvVuNeY/G8+IZr96hQZRiE/dQJngIvEbXeTXgL3jRiMlOky15R1hsB/zAzM6WFz2fMEhaG8+bs3x6/jfcQH3zQh1Yf/wG+Lm8mA+4IP9NWSPzym1HpkclmqhCMzvAmzs7QEkD3wGamUk6J+1Gy1ymu8fY/7wws2f0A9fF/wfPT3N3+v9WPGjjhh5tVss8VsJ9wcv6+TGwUeb5hrilfaF/B8/kB66auRwX9JfjPvfrlLQ5C1cV3AksCywGXL+wP0vXGK8Atsw83wLPLLnQx9bHZ5oCfBgvAHIqLvSnVmh3V/qtRj3GYXw3pL+b4aVItwOuHqfv4gS8HsPQ50Ub1DuNPV5q9nMrvi0cFQKPxxSYLWR3yrSl3BVY3cwOkbQqnha4Som2CU3dLbk82+M2wI1mdoekFYGXm9l54z/aaki63ro8OfKOtYG0k+uwoH6EmeV62fXRz3XmeXC+jM+Nn1SxOTTs6zZgTeBPuEG2ktv1IObFM17oD4th3VyaIq9z+x/c3eslyU5xnlXMIT9RmQxG2SbIi2vMYSTd97uB9c3srQtvVP2hBhXfCt6rsH5EE+RVue4Htsbn0RN40NTAb7R92Bz6nhdt0OkPhYUt1CuwoXkg0nUAZvawxi+D4DD5rJmdkoyyW+FG2SNx9dpk5n3A53Gdb8cbbLLHVNSu+AZjInqr1o9ows74DvAwM3sk7QD/exz66Ude9D0vQui3h6eSt0TH4Dmd6omoJjIT3ihbB0k/MrPdcONc5WpIkwEbmyrgW5JmA2Vqmib1I2pjnp/+9MzzBygvDToUBjkvQui3hyOAXwDPkfRFvGD2/1u4QxoI9ydvkK2BQ1NMRZUgl4nK+vKi4e+TdCKu612ADTCtx7BpumK35lXznkkMbF6ETr9FSHoxrgIRcIGZ1crONxGZDEbZOkj6OO7h8gJcv5y9uK2J/nuioNE5eDor9sPM7PaSdrXqRzwTGeS8CKEfBBMQSUea2YcX9jgmApJ+w0j9iAUBWdZVWKkNDGJehNAPgmAoNF2xS7rJzNYe39G1h8ms+wyCYHJxBl48ZD7um955lHGFpIGnN24rsdIPgmAoNF2xp3Qma1IjeVrQm/DeCYJgWDStt7ztuIympcRKPwiCoRAr9olBCP0gCIZC09QDwWAJoR8EQdAiwnsnCIKgRYTQD4IgaBEh9IMAkPRNSftmnp8r6ZjM869L2q/B+26RUvYGwYQghH4QOJcDm8CCmqwrANnarZvgVYsKSZlMg2DCEkI/CJwrgI3T/y8DbgIek7RcSh/wEmAZSddJulHScek4ku6WdKikOcBOkraRdFt6/raF8WGCoBch9IMAMLM/A/NTGclNgCuBq/EbwUzgDuAY4B3mBa2n4VkPOzxkZusBvwR+ALwZWB943tA+RBBUIIR+EIxwBS7wO0L/yszz+4C7zOwP6dwTgFdn2v48/X1xOu8Oc3/oHw9j4EFQlRD6QTBCR6//cly9cxW+0t8EuKikbZXEYUGw0AmhHwQjXAG8Cfi7mT2dqhEtiwv+04AZktZM5+4GXJzzHrel89ZIz3cZ5zEHQS1C6AfBCDfiXjtXdR171MzuA/YATpF0I15f+KjuNzCzfwF7AWcnQ+6D4z7qIKhBpGEIgiBoEbHSD4IgaBEh9IMgCFpECP0gCIIWEUI/CIKgRYTQD4IgaBEh9IMgCFpECP0gCIIW8f8BCX8eYlTpldgAAAAASUVORK5CYII=\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "## Conclusions:\n", "\n", "- The above explanatory data analysis proof that entity classes disirbutions are unbalanced. Geographical entity, time indicator, organizations and persons are heavily represented.\n", "\n", "- The EDA shows that there are sentences with few numbers of word counts, these needs to be cleaned as these articles might not have a complete sentence.\n", "\n", "- Some of sentences are duplicated more than one." ], "metadata": { "id": "y3rvWHwgujZj" } }, { "cell_type": "markdown", "source": [ "\n", "\n", "\n", "## Acknowledgements\n", "\n", "- The code get_top_n_bigram function is adapted from [towardsdatascience](https://towardsdatascience.com/a-complete-exploratory-data-analysis-and-visualization-for-text-data-29fb1b96fb6a)" ], "metadata": { "id": "lw1wqV8C95xi" } } ] }