Francesco commited on
Commit
5f25427
β€’
0 Parent(s):
Files changed (7) hide show
  1. .gitignore +160 -0
  2. README.md +7 -0
  3. app.py +142 -0
  4. prompts/output.txt +1 -0
  5. prompts/system.prompt +1 -0
  6. prompts/template.prompt +10 -0
  7. requirements.txt +5 -0
.gitignore ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Byte-compiled / optimized / DLL files
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+
6
+ # C extensions
7
+ *.so
8
+
9
+ # Distribution / packaging
10
+ .Python
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ share/python-wheels/
24
+ *.egg-info/
25
+ .installed.cfg
26
+ *.egg
27
+ MANIFEST
28
+
29
+ # PyInstaller
30
+ # Usually these files are written by a python script from a template
31
+ # before PyInstaller builds the exe, so as to inject date/other infos into it.
32
+ *.manifest
33
+ *.spec
34
+
35
+ # Installer logs
36
+ pip-log.txt
37
+ pip-delete-this-directory.txt
38
+
39
+ # Unit test / coverage reports
40
+ htmlcov/
41
+ .tox/
42
+ .nox/
43
+ .coverage
44
+ .coverage.*
45
+ .cache
46
+ nosetests.xml
47
+ coverage.xml
48
+ *.cover
49
+ *.py,cover
50
+ .hypothesis/
51
+ .pytest_cache/
52
+ cover/
53
+
54
+ # Translations
55
+ *.mo
56
+ *.pot
57
+
58
+ # Django stuff:
59
+ *.log
60
+ local_settings.py
61
+ db.sqlite3
62
+ db.sqlite3-journal
63
+
64
+ # Flask stuff:
65
+ instance/
66
+ .webassets-cache
67
+
68
+ # Scrapy stuff:
69
+ .scrapy
70
+
71
+ # Sphinx documentation
72
+ docs/_build/
73
+
74
+ # PyBuilder
75
+ .pybuilder/
76
+ target/
77
+
78
+ # Jupyter Notebook
79
+ .ipynb_checkpoints
80
+
81
+ # IPython
82
+ profile_default/
83
+ ipython_config.py
84
+
85
+ # pyenv
86
+ # For a library or package, you might want to ignore these files since the code is
87
+ # intended to run in multiple environments; otherwise, check them in:
88
+ # .python-version
89
+
90
+ # pipenv
91
+ # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92
+ # However, in case of collaboration, if having platform-specific dependencies or dependencies
93
+ # having no cross-platform support, pipenv may install dependencies that don't work, or not
94
+ # install all needed dependencies.
95
+ #Pipfile.lock
96
+
97
+ # poetry
98
+ # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99
+ # This is especially recommended for binary packages to ensure reproducibility, and is more
100
+ # commonly ignored for libraries.
101
+ # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102
+ #poetry.lock
103
+
104
+ # pdm
105
+ # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106
+ #pdm.lock
107
+ # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108
+ # in version control.
109
+ # https://pdm.fming.dev/#use-with-ide
110
+ .pdm.toml
111
+
112
+ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113
+ __pypackages__/
114
+
115
+ # Celery stuff
116
+ celerybeat-schedule
117
+ celerybeat.pid
118
+
119
+ # SageMath parsed files
120
+ *.sage.py
121
+
122
+ # Environments
123
+ .env
124
+ .venv
125
+ env/
126
+ venv/
127
+ ENV/
128
+ env.bak/
129
+ venv.bak/
130
+
131
+ # Spyder project settings
132
+ .spyderproject
133
+ .spyproject
134
+
135
+ # Rope project settings
136
+ .ropeproject
137
+
138
+ # mkdocs documentation
139
+ /site
140
+
141
+ # mypy
142
+ .mypy_cache/
143
+ .dmypy.json
144
+ dmypy.json
145
+
146
+ # Pyre type checker
147
+ .pyre/
148
+
149
+ # pytype static type analyzer
150
+ .pytype/
151
+
152
+ # Cython debug symbols
153
+ cython_debug/
154
+
155
+ # PyCharm
156
+ # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
157
+ # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158
+ # and can be added to the global gitignore or merged into this file. For a more nuclear
159
+ # option (not recommended) you can uncomment the following to ignore the entire idea folder.
160
+ #.idea/
README.md ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ we will use gradio
2
+
3
+ 1) Transcribe the yt video
4
+ - we need to have an input for you to place the video url
5
+ 2) Then we need to store it in a vector db
6
+ - ConversationTokenBufferMemory
7
+ https://python.langchain.com/en/latest/modules/memory/types/summary_buffer.html
app.py ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import logging
3
+ import os
4
+ from pathlib import Path
5
+ from typing import List
6
+ from uuid import uuid4
7
+
8
+ import gradio as gr
9
+ import openai
10
+ from langchain.chat_models import ChatOpenAI
11
+ from langchain.prompts import HumanMessagePromptTemplate
12
+ from langchain.schema import HumanMessage, SystemMessage
13
+ from youtube_dl import YoutubeDL
14
+
15
+ os.environ["OPENAI_API_KEY"] = "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"
16
+ MODELS_NAMES = ["gpt-3.5-turbo", "gpt-4"]
17
+
18
+ logging.basicConfig(
19
+ format="[%(asctime)s %(levelname)s]: %(message)s", level=logging.DEBUG
20
+ )
21
+
22
+
23
+ system_message = SystemMessage(content=Path("prompts/system.prompt").read_text())
24
+ human_message_prompt_template = HumanMessagePromptTemplate.from_template(
25
+ Path("prompts/template.prompt").read_text()
26
+ )
27
+
28
+
29
+ def download_video_as_mp3(video_url: str, output_filename: str):
30
+ ydl_opts = {
31
+ "format": "bestaudio/best",
32
+ "outtmpl": output_filename,
33
+ "postprocessors": [
34
+ {
35
+ "key": "FFmpegExtractAudio",
36
+ "preferredcodec": "mp3",
37
+ "preferredquality": "192",
38
+ }
39
+ ],
40
+ }
41
+
42
+ with YoutubeDL(ydl_opts) as ydl:
43
+ ydl.download([video_url])
44
+
45
+
46
+ def get_transcription(youtube_url: str):
47
+ logging.info(f"Transcribing {youtube_url}")
48
+ output_filename = Path(f"{str(uuid4())}.mp3")
49
+ download_video_as_mp3(youtube_url, str(output_filename))
50
+ logging.debug(f"video downloaded at {str(output_filename)}")
51
+ with output_filename.open("rb") as audio_file:
52
+ transcript = openai.Audio.transcribe("whisper-1", audio_file, language="en")
53
+ logging.info(f"Done!")
54
+ output_filename.unlink()
55
+ return transcript
56
+
57
+
58
+ def get_youtube_video_info(youtube_transcription: str, messages: List, chat):
59
+ logging.info("Running GPT")
60
+ human_message = human_message_prompt_template.format(
61
+ youtube_transcription=youtube_transcription
62
+ )
63
+ messages.append(human_message)
64
+ reply = chat(messages)
65
+ messages.append(reply)
66
+ logging.info(f"Done!")
67
+ # we don't want the first ever message, too long
68
+ chatbot_messages = [("", reply.content)]
69
+ return chatbot_messages, messages
70
+
71
+
72
+ def run_message_on_chatbot(chat, message: str, chatbot_messages, messages):
73
+ logging.info("asking question to GPT")
74
+ messages.append(HumanMessage(content=message))
75
+ reply = chat(messages)
76
+ messages.append(reply)
77
+ logging.debug(f"reply = {reply.content}")
78
+ logging.info(f"Done!")
79
+ chatbot_messages.append((message, messages[-1].content))
80
+ return "", chatbot_messages, messages
81
+
82
+
83
+ def youtube_guru_button_handler(
84
+ youtube_url: str, messages: List, temperature: float, model_name: str
85
+ ):
86
+ chat = ChatOpenAI(model_name=model_name, temperature=temperature)
87
+ transcription = get_transcription(youtube_url)
88
+ chatbot_messages, messages = get_youtube_video_info(transcription, messages, chat)
89
+ return chatbot_messages, messages, chat
90
+
91
+
92
+ def on_clear_button_click():
93
+ return "", [], [messages]
94
+
95
+
96
+ with gr.Blocks() as demo:
97
+ messages = gr.State([system_message])
98
+ youtube_transcription = gr.State("")
99
+ model_selected = gr.State()
100
+ chat = gr.State()
101
+
102
+ with gr.Column():
103
+ gr.Markdown("# Welcome to YouTubeGuru!")
104
+
105
+ youtube_url = gr.Textbox(
106
+ label="video url", placeholder="https://www.youtube.com/watch?v=dQw4w9WgXcQ"
107
+ )
108
+ chatbot = gr.Chatbot()
109
+ msg = gr.Textbox(label="chat input")
110
+ msg.submit(
111
+ run_message_on_chatbot,
112
+ [chat, msg, chatbot, messages],
113
+ [msg, chatbot, messages],
114
+ )
115
+ with gr.Row():
116
+ with gr.Column():
117
+ clear = gr.Button("Clear")
118
+ clear.click(
119
+ on_clear_button_click,
120
+ [],
121
+ [youtube_transcription, chatbot, messages],
122
+ queue=False,
123
+ )
124
+ with gr.Accordion("Settings", open=False):
125
+ temperature = gr.Slider(
126
+ minimum=0.0,
127
+ maximum=1.0,
128
+ value=0.7,
129
+ step=0.1,
130
+ label="temperate",
131
+ interactive=True,
132
+ )
133
+ model_name = gr.Dropdown(
134
+ choices=MODELS_NAMES, value=MODELS_NAMES[0], label="model"
135
+ )
136
+
137
+ button = gr.Button("Run πŸš€")
138
+ button.click(
139
+ youtube_guru_button_handler,
140
+ inputs=[youtube_url, messages, temperature, model_name],
141
+ outputs=[chatbot, messages, chat],
142
+ )
prompts/output.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ '1. In this video, the speaker provides a list of 30 Twitter accounts that he believes are the best for following machine learning research. He explains that Twitter is a better platform for this than LinkedIn and goes into detail on how to optimize your Twitter feed. The speaker also gives insights and opinions on each of the accounts he recommends.\n\n2. This video provides a comprehensive list of 30 Twitter accounts that are great for following machine learning research. The speaker also gives tips on how to optimize your Twitter feed to get the most out of it. If you\'re interested in staying up-to-date on the latest machine learning research, this video might be a great resource for you.\n\n3. "30 Must-Follow Twitter Accounts for Machine Learning Research"
prompts/system.prompt ADDED
@@ -0,0 +1 @@
 
 
1
+ You are YouTubeGuru, an AI-powered virtual assistant with expertise in summarizing YouTube videos, writing well-written descriptions from transcriptions, and devising impactful titles for your content.
prompts/template.prompt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ Given a video transcription follow these tasks:
2
+
3
+ {{
4
+ "summary": <Summarize the most essential aspects of the video in a concise manner>,
5
+ "description": <Generate a suitable YouTube description for the video, tailored to the content>,
6
+ "title": <Propose an attention-grabbing title for the YouTube video>
7
+ }}
8
+
9
+ Transcription:
10
+ {youtube_transcription}
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ openai
2
+ youtube-dl
3
+ gradio
4
+ git+https://github.com/ytdl-org/youtube-dl.git
5
+ langchain