AmirMohseni
/

YoutubeSummarizer

Model card Files Files and versions Community

AmirMohseni commited on May 5

Commit

0b43703

•

1 Parent(s): 7215301

Upload 4 files

Browse files

Files changed (4) hide show

README.md +44 -3
Summarizer.py +90 -0
UserHandler.py +47 -0
YoutubeScript.ipynb +309 -0

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
----
-license: mit
----

+# YouTube Video to Text Summary Tool
+This Jupyter Notebook tool transforms YouTube videos into concise text summaries using Whisper and ChatGPT 4.
+## Overview
+Leverage `pytube`, `openai`, and `whisper` Python libraries to:
+1. **Download Audio:** Extract audio from YouTube URLs via `pytube`.
+2. **Transcribe Audio:** Convert audio to text with the Whisper library.
+3. **Summarize Text:** Create succinct summaries from transcriptions using ChatGPT 4, emphasizing key points and context.
+## Usage Guide
+1. **Set Up:** Install required libraries with pip:
+    ```
+    pip install pytube openai git+https://github.com/openai/whisper.git
+    ```
+2. **Get Started:** Clone the repository with the Jupyter Notebook.
+3. **Launch Notebook:** Open `YoutubeScript.ipynb` in Jupyter.
+4. **Run Notebook:** Sequentially execute cells, inputting the YouTube URL when asked.
+5. **Generate Summary:** Follow on-screen prompts to download audio, transcribe, and summarize.
+6. **Review Summary:** Access the final summary in `summary.txt`.
+## Using the Code for Telegram Bot
+**To use the code for creating a Telegram bot that generates text summaries from YouTube URLs:**
+1.  Visit [YT\_SummaryBot](https://t.me/YT_SummaryBot) on Telegram.
+2.  Send the command `/summarize` to the bot.
+3.  The bot will ask you for the YouTube URL.
+4.  Reply to the bot with the YouTube URL.
+5.  The bot will provide you with the text summary.
+## Acknowledgments
+- **pytube:** YouTube video downloads.
+- **OpenAI:** ChatGPT 4 model provision.
+- **Whisper:** Audio transcription.

Summarizer.py ADDED Viewed

	@@ -0,0 +1,90 @@

+from pytube import YouTube
+import whisper
+import os
+import subprocess
+from openai import OpenAI
+import ssl
+def download_youtube_audio(url, destination="."):
+    # Create a YouTube object
+    yt = YouTube(url)
+    ssl._create_default_https_context = ssl._create_unverified_context
+    # Select the audio stream
+    audio_stream = yt.streams.filter(only_audio=True).first()
+    # Download the audio stream
+    out_file = audio_stream.download(output_path=destination)
+    # Set up new filename
+    base, ext = os.path.splitext(out_file)
+    audio_file = base + '.mp3'
+    # Convert file to mp3
+    subprocess.run(['ffmpeg', '-i', out_file, audio_file])
+    # Remove the original file
+    os.remove(out_file)
+    print(f"Downloaded and converted to MP3: {audio_file}")
+    return audio_file
+def transcribe_audio(audio_file):
+    model = whisper.load_model("base")
+    result = model.transcribe(audio_file)
+    return result["text"]
+def write_text_to_file(text, filename="transcribed_text.txt"):
+    # Write the text to the file
+    with open(filename, "w") as file:
+        file.write(text)
+def delete_file(file_path):
+    os.remove(file_path)
+def process(url):
+    # Set the destination path for the download
+    file_path = download_youtube_audio(url)
+    prompt = transcribe_audio(file_path)
+    delete_file(file_path)
+    result_summary = summarize_text(prompt)
+    return result_summary
+def summarize_text(prompt):
+    pre_prompt = 'You are a model that receives a transcription of a YouTube video. Your task is to correct any words ' \
+                 'that ' \
+                 'may be incorrect based on the context, and transform it into a well-structured summary of the entire ' \
+                 'video. Your summary should highlight important details and provide additional context when ' \
+                 'necessary. ' \
+                 'Aim to be detailed, particularly when addressing non-trivial aspects of the content. The summary ' \
+                 'should ' \
+                 'encompass at least 20-30% of the original text length.'
+    client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
+    response = client.chat.completions.create(
+        model="gpt-4-turbo",
+        messages=[
+            {"role": "system", "content": pre_prompt},
+            {"role": "user", "content": prompt},
+        ]
+    )
+    # The 'response' will contain the completion from the model
+    summary_result = response.choices[0].message.content
+    return summary_result
+#def main():
+#    print(process("https://www.youtube.com/watch?v=reUZRyXxUs4"))
+#if __name__ == "__main__":
+#    main()

UserHandler.py ADDED Viewed

	@@ -0,0 +1,47 @@

+import os
+import telebot
+from Summarizer import *
+# Define a function to process the YouTube URL and generate a summary
+def process_youtube_url(url):
+    try:
+        # Use the 'process' function from your summarizer module
+        summary = process(url)
+        return summary
+    except Exception as e:
+        # Handle any errors that may occur during processing
+        return f"Error processing YouTube URL: {e}"
+# Set up the Telegram Bot
+def main():
+    # Retrieve the Telegram bot token from environment variables
+    telegram_bot_token = os.getenv('TELEGRAM_BOT_TOKEN', 'YOUR_TELEGRAM_BOT_TOKEN')
+    bot = telebot.TeleBot(telegram_bot_token)
+    # Define a function to handle the user's input  (e.g., the YouTube URL)
+    def url_handler(message):
+        # Get the user's input
+        url = message.text
+        print(f"Received YouTube URL: {url}")
+        # Process the YouTube URL and generate a summary
+        summary = process_youtube_url(url)
+        # Send the summary back to the user
+        bot.send_message(message.chat.id, summary, parse_mode="Markdown")
+    @bot.message_handler(commands=['summarize'])
+    def message_handler(message):
+        text = "Please enter the URL of the YouTube video you would like to summarize."
+        sent_msg = bot.send_message(message.chat.id, text, parse_mode="Markdown")
+        bot.register_next_step_handler(sent_msg, url_handler)
+    # Start the bot
+    bot.infinity_polling()
+# Entry point of the script
+if __name__ == '__main__':
+    main()

YoutubeScript.ipynb ADDED Viewed

	@@ -0,0 +1,309 @@

+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "gpuType": "T4"
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# Summarize Youtube videos into text"
+      ],
+      "metadata": {
+        "id": "Jhu3KrV9Ru1t"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Installing necessary libraries\n"
+      ],
+      "metadata": {
+        "id": "PD3P82YJRIzH"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "%pip install git+https://github.com/openai/whisper.git"
+      ],
+      "metadata": {
+        "id": "JfPAYngLYiJP"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "%pip install pytube\n",
+        "%pip install openai\n",
+        "%pip install numpy"
+      ],
+      "metadata": {
+        "id": "FX8cTWYpAWCQ"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Downloading audio clip from the YouTube video"
+      ],
+      "metadata": {
+        "id": "kPp1Gx_GGVmv"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "id": "jbeQ3nGRABqH"
+      },
+      "outputs": [],
+      "source": [
+        "from pytube import YouTube\n",
+        "import whisper\n",
+        "import os\n",
+        "import subprocess"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def download_youtube_audio(url, destination):\n",
+        "    # Create a YouTube object\n",
+        "    yt = YouTube(url)\n",
+        "\n",
+        "    # Select the audio stream\n",
+        "    audio_stream = yt.streams.filter(only_audio=True).first()\n",
+        "\n",
+        "    # Download the audio stream\n",
+        "    out_file = audio_stream.download(output_path=destination)\n",
+        "\n",
+        "    # Set up new filename\n",
+        "    base, ext = os.path.splitext(out_file)\n",
+        "    new_file = base + '.mp3'\n",
+        "\n",
+        "    # Convert file to mp3\n",
+        "    subprocess.run(['ffmpeg', '-i', out_file, new_file])\n",
+        "\n",
+        "    # Remove the original file\n",
+        "    os.remove(out_file)\n",
+        "\n",
+        "    print(f\"Downloaded and converted to MP3: {new_file}\")\n",
+        "    return new_file"
+      ],
+      "metadata": {
+        "id": "TIpMJJshAR7m"
+      },
+      "execution_count": 4,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Input Youtube link"
+      ],
+      "metadata": {
+        "id": "F-uidWN5RP0R"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "url = input(\"Enter the YouTube URL: \")\n",
+        "\n",
+        "#url = 'https://www.youtube.com/watch?v=reUZRyXxUs4' # as test"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "9nCkN_X_As1l",
+        "outputId": "db1e5e14-0ef9-4ed1-fa85-7cced9b42eab"
+      },
+      "execution_count": 72,
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Enter the YouTube URL: https://youtu.be/4o5hSxvN_-s?si=6ZcSvt69baVOjYBn\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Set the destination path for the download\n",
+        "destination = \"audiofiles/\"\n",
+        "\n",
+        "file_path = download_youtube_audio(url, destination)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "iNdsqUCXA8nf",
+        "outputId": "a401b201-3710-4465-d4ab-cc6f8df28265"
+      },
+      "execution_count": 73,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Downloaded and converted to MP3: /content/audiofiles/This is what happens when you reply to spam email l TED.mp3\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Converting the audio into text using **Whisper 1**"
+      ],
+      "metadata": {
+        "id": "90eokVCvGgGX"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "def write_text_to_file(text, filename=\"transcribed_text.txt\"):\n",
+        "  # Write the text to the file\n",
+        "  with open(filename, \"w\") as file:\n",
+        "      file.write(text)"
+      ],
+      "metadata": {
+        "id": "6fzySYbSMLgT"
+      },
+      "execution_count": 74,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import whisper\n",
+        "\n",
+        "model = whisper.load_model(\"base\")\n",
+        "result = model.transcribe(file_path)\n",
+        "print(result[\"text\"])\n",
+        "\n",
+        "transcription = result[\"text\"]"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "oJlcn4NJa3uR",
+        "outputId": "a34c89eb-3d86-409b-d6a6-b16c40195a77"
+      },
+      "execution_count": 75,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            " I Three years ago I got one of those spam emails and It managed to get through my spam filter not quite sure how but it's telling me my inbox and it was from a guy called Solomon Odonka I know And one like this said hello James Veech. I have an interesting business proposal. I want to share with you Solomon Now my hand was kind of hovering on the delete button. I see why I was looking at my phone. I thought I could just delete this Or I could do what I think we've all Always wanted to do And I said Solomon your email intrigues me And the game was a foot He said dear James Veech we shall be shipping gold to you You will earn 10% of any gold you distribute So I knew I was dealing with a professional I said how much is it worth He said we will start with a smaller quantity. I was like oh, and then he said of 25 kilograms The worst should be about 2.5 million I just saw him and if you're gonna do it, let's go big I can handle it How much gold do you have He said it's not a matter of how much gold how much is your cable if handling? What we can start with 50 kilograms as a trial shipment. I said 50 kilograms There's no point doing this at all. No such as you're being at least a metric ton So what do you do for a living? I Said I'm a hedge fund executive manager This isn't the first time I've shit booty in my friend no no no Then I started the panic I was down. I look where are you based? I don't know about you, but I think we're going by the postal service. It ought to be sign for Right, that's a lot of gold He said oh not be easily convinced my company to do a larger quantity shipment My said Solomon I completely with you on this one. I'm putting together a visual For you's taken for the board meeting whole time This is what I sent Solomon I don't know if we have any statisticians in in the house, but there's definitely something going on I Said a song on attach a email you'll find a helpful chart I've had one of my assistants run the numbers We need to be as much gold as possible There's always a moment where they try to take your heartstrings and this was it for Solomon He said I will be so much happy if the deal goes well because I'm going to get a very good commission as well And I said that's amazing. What are you gonna spend your cut on? And he said on real estate. What about you? I Thought about it for a long time I Said one word It's going place I Was in Saint-Mizard, they know like 30 different varieties also you can call up carrots and you can dip them Have you ever done that? You said I've to go to bed now Till morrow Have sweet dream I didn't know what to say I Said boss war my golden nugget boss war Guys you have to understand this will be going for like weeks or be a hitherto the greatest weeks of my life But I had to knock it on the head it was getting about a hand friends were saying to me James You want to come out for a drink I was like I can't me. I'm expecting email about some gold So I figured I had to knock it on the head. I take it to a ridiculous conclusion. So I thought I can cut to the plan. I said look Solomon Solomon I'm concerned about security when we email each other we need to use a code And you agree Nice it's Solomon I spend all night coming up with this code we need to use an all-father correspondence lawyer Gummy bear Bank cream egg Legal physicala bottle came peanut amends documents jelly beans western union guys a giant gummy lizard I knew these were all words they used right I said Please gonna kick out an all-father corresponded I Didn't hit back after I've gone too far I've gone too far so I had to I had to back pedal a little bit I said look Solomon is the deal still on Kit-cat Because you have to be consistent Then I did get any more back from him. He said the business all I'm traveling a bird of that I said dude you have to use the code What Follow this the greatest email I've ever received I'm not joking. This is what turn up in my inbox. This is a good day The business is on And I'm trying to raise the balance for the gummy bear So So he could submit all the needed physicala bottle jelly beans to the cream egg For the peanut amends process to start 7500 pounds via a giant Gummy living And that was so much fun right that it got me thinking like what would happen if I just felt as much as I could Replying to as many scam emails as I could And that's what I've been doing For three years on your behalf Let me tell you crazy stuff happens when you start replying to scam emails It's really difficult and I highly recommend we do I don't think what I'm doing is me right? You know there are a lot of people out there who do mean things to scams I don't think what I'm doing all I think it's all I'm doing is wasting their time And I think anytime they're spending with me is time they're not spending scamming vulnerable adults out of their savings right And if you're gonna do this and I highly recommend you do Get yourself a pseudonym of email address. Don't use your own email address because that's exactly what I was doing at the start It was a nightmare because you know, I'd wake up in the morning and have like thousand emails about penis enlargement You know only one of which was a legitimate Response to a medical question I'd have I tell you what though guys I tell you what any days of good day Any days of good day if you receive an email the begins like this I am Winnie Mandela The second wife of Nelson Mandela the former South African president. I was like oh that Winnie Mandela I Know so many I need to transfer 45 million dollars out of the country because of my husband Nelson's health condition Let that sink in She sent me this which is hysterical And this and this is fairly legitimate. This is a letter authorization But to be honest if there's nothing written on it. It's just a shape I say Winnie I'm really sorry to hear of this given that Nelson died three months ago. I describe his health conditioners That's a worse health condition you can have not being alive She said kindly comply with my bankers instructions one love I Said of course no woman no cry She said my bank would eat transfer of three thousand dollars one love I said no problem I should share it Thank you\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# The response object will contain the transcription\n",
+        "write_text_to_file(transcription)"
+      ],
+      "metadata": {
+        "id": "R3ttiOCdI94t"
+      },
+      "execution_count": 76,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Converting transcript into summarized text"
+      ],
+      "metadata": {
+        "id": "_vQ7VqW1RgR7"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "preprompt = 'You are a model that receives a transcription of a YouTube video. Your task is to correct any words that may be incorrect based on the context, and transform it into a well-structured summary of the entire video. Your summary should highlight important details and provide additional context when necessary. Aim to be detailed, particularly when addressing non-trivial aspects of the content. The summary should encompass at least 20-30% of the original text length.'\n",
+        "prompt = preprompt + transcription"
+      ],
+      "metadata": {
+        "id": "FtrdIAbhDZSr"
+      },
+      "execution_count": 77,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from openai import OpenAI\n",
+        "from google.colab import userdata\n",
+        "\n",
+        "client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))\n",
+        "\n",
+        "response = client.chat.completions.create(\n",
+        "  model=\"gpt-3.5-turbo\",\n",
+        "  messages=[\n",
+        "    {\"role\": \"system\", \"content\": preprompt},\n",
+        "    {\"role\": \"user\", \"content\": prompt},\n",
+        "  ]\n",
+        ")\n",
+        "\n",
+        "# The 'response' will contain the completion from the model\n",
+        "print(response.choices[0].message.content)\n",
+        "write_text_to_file(response.choices[0].message.content, filename='summary.txt')"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "pYXkw7qdMqZI",
+        "outputId": "2d6b3edb-47ef-4930-d544-5b26e8d1b555"
+      },
+      "execution_count": 78,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Three years ago, the creator received a spam email from someone named Solomon Odonka, offering a business proposal involving shipping gold with a 10% distribution earnings. Initially skeptical, the creator decided to engage with Solomon out of curiosity. Their conversation escalated from discussing smaller quantities of gold to trial shipments of 50 kilograms, with Solomon expressing the intent to conduct larger transactions. The creator, posing as a hedge fund executive, continued the correspondence, even creating elaborate codes for security. The interaction with Solomon led the creator to reflect on the consequences of scamming vulnerable individuals and the value of wasting scammers' time. The creator recommended using a pseudonymous email address to avoid inundation with spam emails. They humorously shared experiences of engaging with various scam emails, including one claiming to be from Winnie Mandela seeking assistance in transferring funds. Despite the absurdity of the interactions, the creator maintained a light-hearted and playful approach throughout the discussion.\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [],
+      "metadata": {
+        "id": "jqVM1469JdNB"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}