{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Simple Pipeline Notebook\n",
"\n",
"Show how the Hugging Face text-to-text\n",
"pipeline works without the Gradio interface. Demonstrate inputs and\n",
"outputs from the model."
],
"metadata": {
"id": "KLk7_yjQEI58"
}
},
{
"cell_type": "markdown",
"source": [
"## Code"
],
"metadata": {
"id": "RnRBT7SpFnQ9"
}
},
{
"cell_type": "markdown",
"source": [
"### Libraries\n",
"\n",
"Start by installing and importing necessary libraries"
],
"metadata": {
"id": "keOW1bhbFpOr"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "73cZfuWW6ofw"
},
"outputs": [],
"source": [
"!pip install transformers"
]
},
{
"cell_type": "code",
"source": [
"from transformers import pipeline # Transformers libraries which imports pipeline to use Hugging-Face models\n",
"import pandas as pd # Pandas library for data manipulation and analysis"
],
"metadata": {
"id": "pQAlKppmF-Cb"
},
"execution_count": 2,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Initialize Sentiment analyzers\n",
"\n",
"In this project, Two pre-trained sentiment analysis models will be used, One is for analysizng English text, and the other one if for Arabic text"
],
"metadata": {
"id": "ZUIn2AIhGDK3"
}
},
{
"cell_type": "code",
"source": [
"#Initialize the Analyzers\n",
"\n",
"# Loads a pretrained model for the Arabic language\n",
"arabic_analyzer = pipeline('sentiment-analysis',model=\"CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment\")\n",
"\n",
"# Loads a pretrained model for the English language\n",
"english_analyzer = pipeline('sentiment-analysis')"
],
"metadata": {
"id": "2GWuZ7MMYki5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Create sample sentences\n",
"\n",
"Here we defined two sets of sentences, On in English and one in Arabic, it will serve as a test data for both the sentiment analysis models."
],
"metadata": {
"id": "00irs1gUdiSB"
}
},
{
"cell_type": "code",
"source": [
"# Define a List of Arabic sentences for the analyzer\n",
"arabic_sentences = [\n",
" 'أصبح غالي جدا',\n",
" 'انا بخير',\n",
" 'اختبار اليوم كان سهلا',\n",
" 'كان اليوم صعب',\n",
" 'ذهبت اليوم لزيارة زميل لي',\n",
" 'كان الطعام لذيذا'\n",
"]\n",
"\n",
"# Define a list of English sentences for the analyzer\n",
"english_sentences = [\n",
" 'What a great day!!! Looks like dream.',\n",
" 'Today first time I arrive in the boat. Its amazing journey',\n",
" 'I`m sorry.',\n",
" 'Sounds like me',\n",
" 'Im studying in psychology',\n",
" 'The movie was okay'\n",
"]"
],
"metadata": {
"id": "Spg5ALZ6a9Dp"
},
"execution_count": 17,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Run The Analyzers\n",
"\n",
"\n",
"Here, The models are using the sample data above, the results are formatted and displayed in a table, Showing the sentence,its predicted label and its score"
],
"metadata": {
"id": "_Hsm8755dqeB"
}
},
{
"cell_type": "code",
"source": [
"# Execute the model on the Arabic sentences\n",
"\n",
"\n",
"arabic_result = arabic_analyzer(arabic_sentences) # Store the analyzer results for each sentence\n",
"arabic_df = pd.DataFrame(arabic_result) # Convert the results in a formatted table\n",
"arabic_df['Sentences'] = arabic_sentences # add column for sentences\n",
"arabic_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 238
},
"id": "_RaQmJYheoyc",
"outputId": "1b8f3108-4448-462c-f297-919950045540"
},
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" label score Sentences\n",
"0 positive 0.561177 أصبح غالي جدا\n",
"1 positive 0.551292 انا بخير\n",
"2 negative 0.461035 اختبار اليوم كان سهلا\n",
"3 negative 0.401242 كان اليوم صعب\n",
"4 neutral 0.585189 ذهبت اليوم لزيارة زميل لي\n",
"5 positive 0.901781 كان الطعام لذيذا"
],
"text/html": [
"\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" score | \n",
" Sentences | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" positive | \n",
" 0.561177 | \n",
" أصبح غالي جدا | \n",
"
\n",
" \n",
" 1 | \n",
" positive | \n",
" 0.551292 | \n",
" انا بخير | \n",
"
\n",
" \n",
" 2 | \n",
" negative | \n",
" 0.461035 | \n",
" اختبار اليوم كان سهلا | \n",
"
\n",
" \n",
" 3 | \n",
" negative | \n",
" 0.401242 | \n",
" كان اليوم صعب | \n",
"
\n",
" \n",
" 4 | \n",
" neutral | \n",
" 0.585189 | \n",
" ذهبت اليوم لزيارة زميل لي | \n",
"
\n",
" \n",
" 5 | \n",
" positive | \n",
" 0.901781 | \n",
" كان الطعام لذيذا | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "arabic_df",
"summary": "{\n \"name\": \"arabic_df\",\n \"rows\": 6,\n \"fields\": [\n {\n \"column\": \"label\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"positive\",\n \"negative\",\n \"neutral\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.1736662143596494,\n \"min\": 0.40124186873435974,\n \"max\": 0.9017810225486755,\n \"num_unique_values\": 6,\n \"samples\": [\n 0.561177134513855,\n 0.5512916445732117,\n 0.9017810225486755\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentences\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"\\u0623\\u0635\\u0628\\u062d \\u063a\\u0627\\u0644\\u064a \\u062c\\u062f\\u0627\",\n \"\\u0627\\u0646\\u0627 \\u0628\\u062e\\u064a\\u0631\",\n \"\\u0643\\u0627\\u0646 \\u0627\\u0644\\u0637\\u0639\\u0627\\u0645 \\u0644\\u0630\\u064a\\u0630\\u0627\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 22
}
]
},
{
"cell_type": "code",
"source": [
"# Execute the model on the English sentences\n",
"\n",
"\n",
"english_result = english_analyzer(english_sentences) # Store the analyzer results for each sentence\n",
"english_df = pd.DataFrame(english_result) # Convert the results in a formatted table\n",
"english_df['Sentences'] = english_sentences # add column for sentences\n",
"english_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 238
},
"id": "A36AXwdrdw0F",
"outputId": "2d758bf6-2453-4b44-bfd8-e4b14bed53ad"
},
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" label score Sentences\n",
"0 POSITIVE 0.999849 What a great day!!! Looks like dream.\n",
"1 POSITIVE 0.999842 Today first time I arrive in the boat. Its ama...\n",
"2 NEGATIVE 0.999688 I`m sorry.\n",
"3 POSITIVE 0.998944 Sounds like me\n",
"4 POSITIVE 0.505438 Im studying in psychology\n",
"5 POSITIVE 0.999792 The movie was okay"
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" score | \n",
" Sentences | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" POSITIVE | \n",
" 0.999849 | \n",
" What a great day!!! Looks like dream. | \n",
"
\n",
" \n",
" 1 | \n",
" POSITIVE | \n",
" 0.999842 | \n",
" Today first time I arrive in the boat. Its ama... | \n",
"
\n",
" \n",
" 2 | \n",
" NEGATIVE | \n",
" 0.999688 | \n",
" I`m sorry. | \n",
"
\n",
" \n",
" 3 | \n",
" POSITIVE | \n",
" 0.998944 | \n",
" Sounds like me | \n",
"
\n",
" \n",
" 4 | \n",
" POSITIVE | \n",
" 0.505438 | \n",
" Im studying in psychology | \n",
"
\n",
" \n",
" 5 | \n",
" POSITIVE | \n",
" 0.999792 | \n",
" The movie was okay | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "english_df",
"summary": "{\n \"name\": \"english_df\",\n \"rows\": 6,\n \"fields\": [\n {\n \"column\": \"label\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"NEGATIVE\",\n \"POSITIVE\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.20175035173408953,\n \"min\": 0.5054381489753723,\n \"max\": 0.9998489618301392,\n \"num_unique_values\": 6,\n \"samples\": [\n 0.9998489618301392,\n 0.999841570854187\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentences\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"What a great day!!! Looks like dream.\",\n \"Today first time I arrive in the boat. Its amazing journey\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "markdown",
"source": [
"### User Input Feature\n",
"\n",
"Here users can input thier own sentences for both English and Arabic languages, the results are then formatted and displayed in a table just like above."
],
"metadata": {
"id": "T5Kp_ICCfAOn"
}
},
{
"cell_type": "code",
"source": [
"# Analyze user's input for the Arabic language\n",
"\n",
"input_arb = input(\"Enter a sentece in Arabic: \") # Prompts the user to enter a sentence in Arabic\n",
"res_arb = arabic_analyzer(input_arb) # Perform sentiment-analysis on the input\n",
"df_in_arb = pd.DataFrame(res_arb) # Convert the results in a formatted table\n",
"df_in_arb['Sentence'] = input_arb # add column for the sentence\n",
"df_in_arb"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 106
},
"id": "WN8xMOZ8hFjJ",
"outputId": "285c0d46-cca9-4ea0-c55d-9ec97e3a6015"
},
"execution_count": 26,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter a sentece in Arabic: جميل\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
" label score Sentence\n",
"0 positive 0.943679 جميل"
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" score | \n",
" Sentence | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" positive | \n",
" 0.943679 | \n",
" جميل | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "df_in_arb",
"summary": "{\n \"name\": \"df_in_arb\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"label\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"positive\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 0.9436786770820618,\n \"max\": 0.9436786770820618,\n \"num_unique_values\": 1,\n \"samples\": [\n 0.9436786770820618\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"\\u062c\\u0645\\u064a\\u0644\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 26
}
]
},
{
"cell_type": "code",
"source": [
"# Analyze user's input for the English language\n",
"\n",
"input_eng = input(\"Enter a sentece in English: \") # Prompts the user to enter a sentence in English\n",
"res_eng = english_analyzer (input_eng) # Perform sentiment-analysis on the input\n",
"df_in_eng = pd.DataFrame(res_eng) # Convert the results in a formatted table\n",
"df_in_eng['Sentence'] = input_eng # add column for the sentence\n",
"df_in_eng"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 106
},
"id": "ETvVN5pjfPRm",
"outputId": "7631ed05-8952-4ea8-9314-9fc28281103a"
},
"execution_count": 25,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Enter a sentece in English: Good\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
" label score Sentence\n",
"0 POSITIVE 0.999816 Good"
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" score | \n",
" Sentence | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" POSITIVE | \n",
" 0.999816 | \n",
" Good | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "df_in_eng",
"summary": "{\n \"name\": \"df_in_eng\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"label\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"POSITIVE\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"score\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": null,\n \"min\": 0.9998161196708679,\n \"max\": 0.9998161196708679,\n \"num_unique_values\": 1,\n \"samples\": [\n 0.9998161196708679\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentence\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Good\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 25
}
]
}
]
}