donb-hf commited on
Commit
97ee08c
1 Parent(s): 435f733

chainlit intro page updates

Browse files
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
- title: BeyondChatGPT Demo
3
- emoji: 📉
4
- colorFrom: pink
5
- colorTo: yellow
6
  sdk: docker
7
  pinned: false
8
  license: openrail
 
1
  ---
2
+ title: AirBNB -- RAG Evaluation
3
+ emoji: 👜💵📈
4
+ colorFrom: white
5
+ colorTo: green
6
  sdk: docker
7
  pinned: false
8
  license: openrail
airbnb-langchain-rag-inference.png ADDED
airbnb-langchain-rag-loader.png ADDED
airbnb_langchain_rag_loader_retriever.ipynb ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# 📱 Using RAG to perform analysis of AirBNB Quarterly Report! 🤖\n",
8
+ "\n",
9
+ "[AirBNB Quarterly Report For the quarterly period ended March 31, 2024](https://airbnb2020ipo.q4web.com/files/doc_financials/2024/q1/fdb60f7d-e616-43dc-86ef-e33d3a9bdd05.pdf)\n",
10
+ "\n",
11
+ "## Question 1\n",
12
+ "- question: What is Airbnb's 'Description of Business'?\n",
13
+ "- response: Airbnb's 'Description of Business' is operating a global platform for unique stays and experiences, connecting hosts and guests online or through mobile devices to book spaces and experiences around the world.\n",
14
+ "- LangSmith trace: https://smith.langchain.com/public/ebdf5473-64ac-4f85-81ab-bd3c3d624969/r\n",
15
+ "\n",
16
+ "## Question 2\n",
17
+ "- question: What was the total value of 'Cash and cash equivalents' as of December 31, 2023?\n",
18
+ "- response: The total value of 'Cash and cash equivalents' as of December 31, 2023, is $2,369.\n",
19
+ "- LangSmith trace: https://smith.langchain.com/public/b0f93487-c729-4ccf-93f9-0354078282d8/r\n",
20
+ "\n",
21
+ "## Question 3\n",
22
+ "- question: What is the 'maximum number of shares to be sold under the 10b5-1 Trading plan' by Brian Chesky?\n",
23
+ "- response: The maximum number of shares to be sold under the 10b5-1 Trading plan by Brian Chesky is 1,146,000.\n",
24
+ "- LangSmith trace: https://smith.langchain.com/public/7fc4b549-2ea5-4b86-abf9-71d5e9a62738/r"
25
+ ]
26
+ },
27
+ {
28
+ "cell_type": "markdown",
29
+ "metadata": {},
30
+ "source": []
31
+ },
32
+ {
33
+ "cell_type": "code",
34
+ "execution_count": 1,
35
+ "metadata": {},
36
+ "outputs": [
37
+ {
38
+ "name": "stdout",
39
+ "output_type": "stream",
40
+ "text": [
41
+ "Note: you may need to restart the kernel to use updated packages.\n",
42
+ "Note: you may need to restart the kernel to use updated packages.\n",
43
+ "Note: you may need to restart the kernel to use updated packages.\n",
44
+ "Note: you may need to restart the kernel to use updated packages.\n",
45
+ "Note: you may need to restart the kernel to use updated packages.\n"
46
+ ]
47
+ }
48
+ ],
49
+ "source": [
50
+ "%pip install -qU pymupdf \n",
51
+ "%pip install -qU langchain langchain-core langchain-community langchain-text-splitters \n",
52
+ "%pip install -qU langchain-openai\n",
53
+ "%pip install -qU langchain-groq\n",
54
+ "%pip install -qU langchain-qdrant"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "code",
59
+ "execution_count": 2,
60
+ "metadata": {},
61
+ "outputs": [],
62
+ "source": [
63
+ "import os\n",
64
+ "from langchain import hub\n",
65
+ "from langchain_groq import ChatGroq\n",
66
+ "\n",
67
+ "llm = ChatGroq(model=\"llama3-70b-8192\", temperature=0.3)\n",
68
+ "\n",
69
+ "os.environ[\"GROQ_API_KEY\"] = os.getenv(\"GROQ_API_KEY\")\n",
70
+ "os.environ[\"OPENAI_API_KEY\"] = os.getenv(\"OPENAI_API_KEY\")\n",
71
+ "\n",
72
+ "QDRANT_API_KEY = os.getenv(\"QDRANT_API_KEY\")\n",
73
+ "QDRANT_API_URL = os.getenv(\"QDRANT_URL\")\n",
74
+ "\n",
75
+ "# LangSmith tracing and \n",
76
+ "os.environ[\"LANGCHAIN_PROJECT\"] = \"AirBnB PDF Jun18\"\n",
77
+ "os.environ[\"LANGCHAIN_ENDPOINT\"]=os.getenv(\"LANGCHAIN_ENDPOINT\")\n",
78
+ "os.environ[\"LANGCHAIN_API_KEY\"]=os.getenv(\"LANGCHAIN_API_KEY\")\n",
79
+ "os.environ[\"LANGCHAIN_TRACING_V2\"]=os.getenv(\"LANGCHAIN_TRACING_V2\")\n",
80
+ "\n",
81
+ "# Leverage a prompt from the LangChain hub\n",
82
+ "prompt = hub.pull(\"rlm/rag-prompt-llama3\")\n",
83
+ "# prompt = hub.pull(\"cracked-nut/securities-comm-llama3\")\n",
84
+ "# prompt = hub.pull(\"cracked-nut/securities-comm-llama3-v2\")"
85
+ ]
86
+ },
87
+ {
88
+ "cell_type": "code",
89
+ "execution_count": 3,
90
+ "metadata": {},
91
+ "outputs": [],
92
+ "source": [
93
+ "# Parameterize some stuff\n",
94
+ "\n",
95
+ "LOAD_NEW_DATA = False # Set to True to load new data\n",
96
+ "FILE_PATH = \"/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf\" # Path to the source PDF file\n",
97
+ "collection = \"airbnb_pdf_rec_1000_200_images\" # qdrant collection name\n",
98
+ "\n",
99
+ "# QUESTION = \"What is Airbnb's 'Description of Business'?\"\n",
100
+ "QUESTION = \"What was the total value of 'Cash and cash equivalents' as of December 31, 2023?\"\n",
101
+ "# QUESTION = \"What is the 'maximum number of shares to be sold under the 10b5-1 Trading plan' by Brian Chesky?\""
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "code",
106
+ "execution_count": 4,
107
+ "metadata": {},
108
+ "outputs": [],
109
+ "source": [
110
+ "from langchain_community.document_loaders import PyMuPDFLoader\n",
111
+ "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
112
+ "from langchain_qdrant import Qdrant"
113
+ ]
114
+ },
115
+ {
116
+ "cell_type": "code",
117
+ "execution_count": 5,
118
+ "metadata": {},
119
+ "outputs": [],
120
+ "source": [
121
+ "from langchain_openai import OpenAIEmbeddings\n",
122
+ "\n",
123
+ "embedding = OpenAIEmbeddings(model=\"text-embedding-3-small\")"
124
+ ]
125
+ },
126
+ {
127
+ "cell_type": "code",
128
+ "execution_count": 6,
129
+ "metadata": {},
130
+ "outputs": [],
131
+ "source": [
132
+ "# run loader if LOAD_NEW_DATA is True\n",
133
+ "if LOAD_NEW_DATA:\n",
134
+ " loader = PyMuPDFLoader(FILE_PATH, extract_images=True)\n",
135
+ " docs = loader.load()\n",
136
+ "\n",
137
+ " text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\n",
138
+ " splits = text_splitter.split_documents(docs)"
139
+ ]
140
+ },
141
+ {
142
+ "cell_type": "code",
143
+ "execution_count": 7,
144
+ "metadata": {},
145
+ "outputs": [],
146
+ "source": [
147
+ "# Store the chunks in Qdrant\n",
148
+ "if LOAD_NEW_DATA:\n",
149
+ " from_splits = Qdrant.from_documents(\n",
150
+ " embedding=embedding,\n",
151
+ " collection_name=collection,\n",
152
+ " url=QDRANT_API_URL,\n",
153
+ " api_key=QDRANT_API_KEY,\n",
154
+ " prefer_grpc=True, \n",
155
+ " documents=splits,\n",
156
+ " )"
157
+ ]
158
+ },
159
+ {
160
+ "cell_type": "code",
161
+ "execution_count": 14,
162
+ "metadata": {},
163
+ "outputs": [],
164
+ "source": [
165
+ "qdrant = Qdrant.from_existing_collection(\n",
166
+ " embedding=embedding,\n",
167
+ " collection_name=collection,\n",
168
+ " url=QDRANT_API_URL,\n",
169
+ " api_key=QDRANT_API_KEY,\n",
170
+ " prefer_grpc=True, \n",
171
+ ")\n",
172
+ "\n",
173
+ "# retriever = qdrant.as_retriever(search_type=\"mmr\", search_kwargs={\"k\": 8})\n",
174
+ "\n",
175
+ "# retriever = qdrant.as_retriever(search_kwargs={\"k\": 5})\n",
176
+ "\n",
177
+ "retriever = qdrant.as_retriever(\n",
178
+ " search_type=\"similarity_score_threshold\",\n",
179
+ " search_kwargs={\"score_threshold\": 0.5, \"k\": 5}\n",
180
+ ")\n"
181
+ ]
182
+ },
183
+ {
184
+ "cell_type": "code",
185
+ "execution_count": 15,
186
+ "metadata": {},
187
+ "outputs": [],
188
+ "source": [
189
+ "from operator import itemgetter\n",
190
+ "from langchain.schema.runnable import RunnablePassthrough\n",
191
+ "\n",
192
+ "rag_chain = (\n",
193
+ " {\"context\": itemgetter(\"question\") | retriever, \"question\": itemgetter(\"question\")}\n",
194
+ " | RunnablePassthrough.assign(context=itemgetter(\"context\"))\n",
195
+ " | {\"response\": prompt | llm, \"context\": itemgetter(\"context\")}\n",
196
+ ")"
197
+ ]
198
+ },
199
+ {
200
+ "cell_type": "code",
201
+ "execution_count": 16,
202
+ "metadata": {},
203
+ "outputs": [
204
+ {
205
+ "name": "stdout",
206
+ "output_type": "stream",
207
+ "text": [
208
+ " +---------------------------------+ \n",
209
+ " | Parallel<context,question>Input | \n",
210
+ " +---------------------------------+ \n",
211
+ " **** **** \n",
212
+ " **** *** \n",
213
+ " ** **** \n",
214
+ "+--------------------------------+ ** \n",
215
+ "| Lambda(itemgetter('question')) | * \n",
216
+ "+--------------------------------+ * \n",
217
+ " * * \n",
218
+ " * * \n",
219
+ " * * \n",
220
+ " +----------------------+ +--------------------------------+ \n",
221
+ " | VectorStoreRetriever | | Lambda(itemgetter('question')) | \n",
222
+ " +----------------------+ +--------------------------------+ \n",
223
+ " **** **** \n",
224
+ " **** **** \n",
225
+ " ** ** \n",
226
+ " +----------------------------------+ \n",
227
+ " | Parallel<context,question>Output | \n",
228
+ " +----------------------------------+ \n",
229
+ " * \n",
230
+ " * \n",
231
+ " * \n",
232
+ " +------------------------+ \n",
233
+ " | Parallel<context>Input | \n",
234
+ " +------------------------+ \n",
235
+ " *** *** \n",
236
+ " *** *** \n",
237
+ " ** ** \n",
238
+ " +-------------------------------+ +-------------+ \n",
239
+ " | Lambda(itemgetter('context')) | | Passthrough | \n",
240
+ " +-------------------------------+ +-------------+ \n",
241
+ " *** *** \n",
242
+ " *** *** \n",
243
+ " ** ** \n",
244
+ " +-------------------------+ \n",
245
+ " | Parallel<context>Output | \n",
246
+ " +-------------------------+ \n",
247
+ " * \n",
248
+ " * \n",
249
+ " * \n",
250
+ " +---------------------------------+ \n",
251
+ " | Parallel<response,context>Input | \n",
252
+ " +---------------------------------+ \n",
253
+ " **** *** \n",
254
+ " *** *** \n",
255
+ " ** **** \n",
256
+ " +--------------------+ ** \n",
257
+ " | ChatPromptTemplate | * \n",
258
+ " +--------------------+ * \n",
259
+ " * * \n",
260
+ " * * \n",
261
+ " * * \n",
262
+ " +----------+ +-------------------------------+ \n",
263
+ " | ChatGroq | | Lambda(itemgetter('context')) | \n",
264
+ " +----------+** +-------------------------------+ \n",
265
+ " **** **** \n",
266
+ " *** *** \n",
267
+ " ** ** \n",
268
+ " +----------------------------------+ \n",
269
+ " | Parallel<response,context>Output | \n",
270
+ " +----------------------------------+ \n"
271
+ ]
272
+ }
273
+ ],
274
+ "source": [
275
+ "print(rag_chain.get_graph().draw_ascii())"
276
+ ]
277
+ },
278
+ {
279
+ "cell_type": "code",
280
+ "execution_count": 17,
281
+ "metadata": {},
282
+ "outputs": [],
283
+ "source": [
284
+ "response = rag_chain.invoke({\"question\" : QUESTION})"
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "code",
289
+ "execution_count": 18,
290
+ "metadata": {},
291
+ "outputs": [
292
+ {
293
+ "name": "stdout",
294
+ "output_type": "stream",
295
+ "text": [
296
+ "The total value of 'Cash and cash equivalents' as of December 31, 2023, is $2,369.\n"
297
+ ]
298
+ }
299
+ ],
300
+ "source": [
301
+ "# return the response. filter on the response key AIMessage content element\n",
302
+ "print(response[\"response\"].content)\n"
303
+ ]
304
+ },
305
+ {
306
+ "cell_type": "code",
307
+ "execution_count": 19,
308
+ "metadata": {},
309
+ "outputs": [
310
+ {
311
+ "data": {
312
+ "text/plain": [
313
+ "[Document(page_content='December 31, 2023\\nLevel\\xa01\\nLevel\\xa02\\nLevel\\xa03\\nTotal\\nAssets\\nCash and cash equivalents:\\nMoney market funds\\n$\\n2,018\\xa0 $\\n—\\xa0 $\\n—\\xa0 $\\n2,018\\xa0\\nCertificates of deposit\\n—\\xa0\\n1\\xa0\\n—\\xa0\\n1\\xa0\\nGovernment bonds\\n—\\xa0\\n115\\xa0\\n—\\xa0\\n115\\xa0\\nCommercial paper\\n—\\xa0\\n223\\xa0\\n—\\xa0\\n223\\xa0\\nCorporate debt securities\\n—\\xa0\\n12\\xa0\\n—\\xa0\\n12\\xa0\\n2,018\\xa0\\n351\\xa0\\n—\\xa0\\n2,369\\xa0\\nShort-term investments:\\nCertificates of deposit\\n—\\xa0\\n172\\xa0\\n—\\xa0\\n172\\xa0\\nGovernment bonds\\n—\\xa0\\n333\\xa0\\n—\\xa0\\n333\\xa0\\nCommercial paper\\n—\\xa0\\n366\\xa0\\n—\\xa0\\n366\\xa0\\nCorporate debt securities\\n—\\xa0\\n1,491\\xa0\\n—\\xa0\\n1,491\\xa0\\nMortgage-backed and asset-backed securities\\n—\\xa0\\n145\\xa0\\n—\\xa0\\n145\\xa0\\n—\\xa0\\n2,507\\xa0\\n—\\xa0\\n2,507\\xa0\\nFunds receivable and amounts held on behalf of customers:\\nMoney market funds\\n1,360\\xa0\\n—\\xa0\\n—\\xa0\\n1,360\\xa0\\nPrepaids and other current assets:\\nForeign exchange derivative assets\\n—\\xa0\\n27\\xa0\\n—\\xa0\\n27\\xa0\\nOther assets, noncurrent:\\nCorporate debt securities\\n—\\xa0\\n—\\xa0\\n4\\xa0\\n4\\xa0\\nTotal assets at fair value\\n$\\n3,378\\xa0 $\\n2,885\\xa0 $\\n4\\xa0 $\\n6,267\\xa0\\nLiabilities\\nAccrued expenses, accounts payable, and other current liabilities:\\nForeign exchange derivative liabilities\\n$\\n—\\xa0 $\\n55\\xa0 $\\n—\\xa0 $\\n55', metadata={'subject': 'Form 10-Q filed on 2024-05-08 for the period ending 2024-03-31', 'creator': 'EDGAR Filing HTML Converter', 'total_pages': 54, 'keywords': '0001559720-24-000017; ; 10-Q', 'modDate': \"D:20240508161807-04'00'\", 'encryption': 'Standard V2 R3 128-bit RC4', 'trapped': '', 'format': 'PDF 1.4', 'creationDate': \"D:20240508161757-04'00'\", 'source': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'file_path': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'page': 12, 'title': '0001559720-24-000017', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', '_id': '5e811a77-6780-4705-8052-062160087f10', '_collection_name': 'airbnb_pdf_rec_1000_200_images'}),\n",
314
+ " Document(page_content='Our cash and cash equivalents are generally held at large global systemically important banks (or “G-SIBs”) which are subject to high capital requirements and are required to\\nregularly perform stringent stress tests related to their ability to absorb capital losses. Our cash, cash equivalents, and short-term investments held outside the United States may be\\nrepatriated, subject to certain limitations, and would be available to be used to fund our domestic operations. However, repatriation of such funds may result in additional tax\\nliabilities. We believe that our existing cash, cash equivalents, and short-term investments balances in the United States are sufficient to fund our working capital needs in the United\\nStates.\\nWe have access to $1.0 billion of commitments and a $200\\xa0million sub-limit for the issuance of letters of credit under the 2022 Credit Facility. As of March\\xa031, 2024, no amounts were', metadata={'subject': 'Form 10-Q filed on 2024-05-08 for the period ending 2024-03-31', 'creator': 'EDGAR Filing HTML Converter', 'total_pages': 54, 'keywords': '0001559720-24-000017; ; 10-Q', 'modDate': \"D:20240508161807-04'00'\", 'encryption': 'Standard V2 R3 128-bit RC4', 'trapped': '', 'format': 'PDF 1.4', 'source': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'creationDate': \"D:20240508161757-04'00'\", 'file_path': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'page': 28, 'title': '0001559720-24-000017', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', '_id': '9c9e6975-1d72-4c4d-9b9e-c129dfe56ddb', '_collection_name': 'airbnb_pdf_rec_1000_200_images'}),\n",
315
+ " Document(page_content='2023 and March\\xa031, 2024, respectively. A total of $283 million and $479 million of these securities, with unrealized losses of $14 million and $16 million, were in a continuous\\nunrealized loss position for more than twelve months as of December\\xa031, 2023 and March\\xa031, 2024, respectively.\\nThe following table summarizes the contractual maturities of the Company’s available-for-sale debt securities (in millions):\\nMarch 31, 2024\\nAmortized\\nCost\\nEstimated\\nFair Value\\nDue within one year\\n$\\n1,489\\xa0 $\\n1,489\\xa0\\nDue after one year through five years\\n957\\xa0\\n947\\xa0\\nDue after five years\\n96\\xa0\\n92\\xa0\\nTotal\\n$\\n2,542\\xa0 $\\n2,528\\xa0\\nNote 5. Fair Value Measurements and Financial Instruments\\nThe following table summarizes the Company’s financial assets and liabilities measured at fair value on a recurring basis (in millions):\\nDecember 31, 2023\\nLevel\\xa01\\nLevel\\xa02\\nLevel\\xa03\\nTotal\\nAssets\\nCash and cash equivalents:\\nMoney market funds\\n$\\n2,018\\xa0 $\\n—\\xa0 $\\n—\\xa0 $\\n2,018\\xa0\\nCertificates of deposit\\n—\\xa0\\n1\\xa0\\n—\\xa0\\n1\\xa0\\nGovernment bonds\\n—\\xa0\\n115\\xa0\\n—\\xa0\\n115', metadata={'subject': 'Form 10-Q filed on 2024-05-08 for the period ending 2024-03-31', 'creator': 'EDGAR Filing HTML Converter', 'total_pages': 54, 'keywords': '0001559720-24-000017; ; 10-Q', 'trapped': '', 'encryption': 'Standard V2 R3 128-bit RC4', 'modDate': \"D:20240508161807-04'00'\", 'format': 'PDF 1.4', 'creationDate': \"D:20240508161757-04'00'\", 'file_path': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'source': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'page': 12, 'title': '0001559720-24-000017', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', '_id': '8e24fa1f-628b-4773-9011-e5ba6e97d830', '_collection_name': 'airbnb_pdf_rec_1000_200_images'}),\n",
316
+ " Document(page_content='$\\n12,667\\xa0 $\\n16,529\\xa0\\nSupplemental disclosures of balance sheet information\\nSupplemental balance sheet information consisted of the following (in millions):\\nDecember 31,\\n2023\\nMarch 31,\\n2024\\nOther assets, noncurrent:\\nProperty and equipment, net\\n$\\n160\\xa0 $\\n171\\xa0\\nOperating lease right-of-use assets\\n119\\xa0\\n111\\xa0\\nOther\\n184\\xa0\\n190\\xa0\\nOther assets, noncurrent\\n$\\n463\\xa0 $\\n472\\xa0\\nAccrued expenses, accounts payable, and other current liabilities:\\nIndirect taxes payable and withholding tax reserves\\n$\\n1,119\\xa0 $\\n1,455\\xa0\\nCompensation and employee benefits\\n436\\xa0\\n346\\xa0\\nAccounts payable\\n141\\xa0\\n184\\xa0\\nOperating lease liabilities, current\\n61\\xa0\\n61\\xa0\\nOther\\n897\\xa0\\n922\\xa0\\nAccrued expenses, accounts payable, and other current liabilities\\n$\\n2,654\\xa0 $\\n2,968\\xa0\\nOther liabilities, noncurrent:\\nOperating lease liabilities, noncurrent\\n$\\n252\\xa0 $\\n237\\xa0\\nOther liabilities, noncurrent\\n287\\xa0\\n273\\xa0\\nOther liabilities, noncurrent\\n$\\n539\\xa0 $\\n510\\xa0\\nPayments to Customers', metadata={'subject': 'Form 10-Q filed on 2024-05-08 for the period ending 2024-03-31', 'creator': 'EDGAR Filing HTML Converter', 'total_pages': 54, 'keywords': '0001559720-24-000017; ; 10-Q', 'modDate': \"D:20240508161807-04'00'\", 'encryption': 'Standard V2 R3 128-bit RC4', 'trapped': '', 'format': 'PDF 1.4', 'source': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'file_path': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'creationDate': \"D:20240508161757-04'00'\", 'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'page': 10, 'title': '0001559720-24-000017', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', '_id': '28a8a595-2736-403b-a5c3-8411353cc470', '_collection_name': 'airbnb_pdf_rec_1000_200_images'}),\n",
317
+ " Document(page_content='Table of Contents\\nAirbnb, Inc.\\nNotes to Condensed Consolidated Financial Statements (unaudited)\\nNote 3. Supplemental Financial Statement Information\\nCash, Cash Equivalents, and Restricted Cash\\nThe following table reconciles cash, cash equivalents, and restricted cash reported on the Company’s unaudited condensed consolidated balance sheets to the total amount\\npresented in the unaudited condensed consolidated statements of cash flows (in millions):\\nDecember 31,\\n2023\\nMarch 31,\\n2024\\nCash and cash equivalents\\n$\\n6,874\\xa0 $\\n7,829\\xa0\\nCash and cash equivalents included in funds receivable and amounts held on behalf of customers\\n5,769\\xa0\\n8,665\\xa0\\nRestricted cash included in prepaids and other current assets\\n24\\xa0\\n35\\xa0\\nTotal cash, cash equivalents, and restricted cash presented in the unaudited condensed consolidated statements of cash flows\\n$\\n12,667\\xa0 $\\n16,529\\xa0\\nSupplemental disclosures of balance sheet information\\nSupplemental balance sheet information consisted of the following (in millions):\\nDecember 31,', metadata={'subject': 'Form 10-Q filed on 2024-05-08 for the period ending 2024-03-31', 'creator': 'EDGAR Filing HTML Converter', 'total_pages': 54, 'keywords': '0001559720-24-000017; ; 10-Q', 'trapped': '', 'encryption': 'Standard V2 R3 128-bit RC4', 'modDate': \"D:20240508161807-04'00'\", 'format': 'PDF 1.4', 'creationDate': \"D:20240508161757-04'00'\", 'file_path': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'source': '/home/donbr/aie3-bootcamp/AIE3/Week 3/Day 2/files/airbnb.pdf', 'producer': 'EDGRpdf Service w/ EO.Pdf 22.0.40.0', 'page': 10, 'title': '0001559720-24-000017', 'author': 'EDGAR® Online LLC, a subsidiary of OTC Markets Group', '_id': 'fc69f02a-bdc5-443a-8e3e-8e2254b6012a', '_collection_name': 'airbnb_pdf_rec_1000_200_images'})]"
318
+ ]
319
+ },
320
+ "execution_count": 19,
321
+ "metadata": {},
322
+ "output_type": "execute_result"
323
+ }
324
+ ],
325
+ "source": [
326
+ "response[\"context\"]"
327
+ ]
328
+ }
329
+ ],
330
+ "metadata": {
331
+ "kernelspec": {
332
+ "display_name": "Python 3",
333
+ "language": "python",
334
+ "name": "python3"
335
+ },
336
+ "language_info": {
337
+ "codemirror_mode": {
338
+ "name": "ipython",
339
+ "version": 3
340
+ },
341
+ "file_extension": ".py",
342
+ "mimetype": "text/x-python",
343
+ "name": "python",
344
+ "nbconvert_exporter": "python",
345
+ "pygments_lexer": "ipython3",
346
+ "version": "3.11.8"
347
+ }
348
+ },
349
+ "nbformat": 4,
350
+ "nbformat_minor": 2
351
+ }
app.py CHANGED
@@ -15,23 +15,23 @@ from starters import set_starters
15
 
16
  load_dotenv()
17
 
18
- GROQ_API_KEY = os.environ["GROQ_API_KEY"]
19
- OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
20
-
21
- QDRANT_API_KEY = os.environ["QDRANT_API_KEY"]
22
- QDRANT_API_URL = os.environ["QDRANT_API_URL"]
23
-
24
- LANGCHAIN_PROJECT = "AirBnB PDF Jun18"
25
  LANGCHAIN_ENDPOINT = os.environ["LANGCHAIN_ENDPOINT"]
26
  LANGCHAIN_API_KEY = os.environ["LANGCHAIN_API_KEY"]
27
  LANGCHAIN_TRACING_V2 = os.environ["LANGCHAIN_TRACING_V2"]
 
28
 
29
- LLAMA3_PROMPT = hub.pull("rlm/rag-prompt-llama3")
30
- # LLAMA3_PROMPT = hub.pull("cracked-nut/securities-comm-llama3-v2")
 
31
 
 
32
  embedding = OpenAIEmbeddings(model="text-embedding-3-small")
33
- collection = "airbnb_pdf_rec_1000_200_images"
34
- llm = ChatGroq(model="llama3-70b-8192", temperature=0.3)
 
 
 
35
 
36
  qdrant = Qdrant.from_existing_collection(
37
  embedding=embedding,
@@ -41,14 +41,17 @@ qdrant = Qdrant.from_existing_collection(
41
  prefer_grpc=True,
42
  )
43
 
44
- retriever = qdrant.as_retriever(search_kwargs={"k": 5})
 
 
 
45
 
46
  @cl.on_chat_start
47
  async def start_chat():
48
  rag_chain = (
49
  {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
50
  | RunnablePassthrough.assign(context=itemgetter("context"))
51
- | {"response": LLAMA3_PROMPT | llm, "context": itemgetter("context")}
52
  )
53
 
54
  cl.user_session.set("rag_chain", rag_chain)
 
15
 
16
  load_dotenv()
17
 
18
+ LANGCHAIN_PROJECT = os.environ["LANGCHAIN_PROJECT"]
 
 
 
 
 
 
19
  LANGCHAIN_ENDPOINT = os.environ["LANGCHAIN_ENDPOINT"]
20
  LANGCHAIN_API_KEY = os.environ["LANGCHAIN_API_KEY"]
21
  LANGCHAIN_TRACING_V2 = os.environ["LANGCHAIN_TRACING_V2"]
22
+ LANGCHAIN_HUB_PROMPT = os.environ["LANGCHAIN_HUB_PROMPT"]
23
 
24
+ GROQ_API_KEY = os.environ["GROQ_API_KEY"]
25
+ llm = ChatGroq(model="llama3-70b-8192", temperature=0.3)
26
+ prompt = hub.pull(LANGCHAIN_HUB_PROMPT)
27
 
28
+ OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
29
  embedding = OpenAIEmbeddings(model="text-embedding-3-small")
30
+
31
+ QDRANT_API_KEY = os.environ["QDRANT_API_KEY"]
32
+ QDRANT_API_URL = os.environ["QDRANT_API_URL"]
33
+ QDRANT_COLLECTION = os.environ["QDRANT_COLLECTION"]
34
+ collection = QDRANT_COLLECTION
35
 
36
  qdrant = Qdrant.from_existing_collection(
37
  embedding=embedding,
 
41
  prefer_grpc=True,
42
  )
43
 
44
+ retriever = qdrant.as_retriever(
45
+ search_type="similarity_score_threshold",
46
+ search_kwargs={"score_threshold": 0.5, "k": 5}
47
+ )
48
 
49
  @cl.on_chat_start
50
  async def start_chat():
51
  rag_chain = (
52
  {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
53
  | RunnablePassthrough.assign(context=itemgetter("context"))
54
+ | {"response": prompt | llm, "context": itemgetter("context")}
55
  )
56
 
57
  cl.user_session.set("rag_chain", rag_chain)
chainlit.md CHANGED
@@ -1,3 +1,37 @@
1
- # Beyond ChatGPT
2
 
3
- This dope Chainlit app was whipped up using the instructions from [this repository!](https://github.com/AI-Maker-Space/Beyond-ChatGPT) 💻✨
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RAG Application for 10-Q Filing Reviews
2
 
3
+ ## Built using the following specs:
4
+ - Data: [AirBnB Securities Commision pdf](https://airbnb2020ipo.q4web.com/files/doc_financials/2024/q1/fdb60f7d-e616-43dc-86ef-e33d3a9bdd05.pdf) - Form 10-Q filing for Q1 of 2024
5
+ - LLM: Llama3-70B running on Groq
6
+ - Embedding model: OpenAI text-embedding-3-small
7
+ - Infrastructure / Framework: LangChaing
8
+ - Vector Store: Qdrant
9
+ - UI: Chainlit
10
+ - Deployment: Docker on HuggingFace Spaces
11
+
12
+ ## RAG Data Ingestion
13
+
14
+ - Additional details on the Data Pipeline are in this [Jupyter Notebook](./airbnb_langchain_rag_loader_retriever.ipynb)
15
+
16
+ !["RAG Data Pipeline"](./airbnb-langchain-rag-loader.png)
17
+
18
+ ## RAG Inference
19
+
20
+ !["RAG Inference"](./airbnb-langchain-rag-inference.png)
21
+
22
+ ### Sample Questions
23
+
24
+ #### Question 1
25
+ - Question: What is Airbnb's 'Description of Business'?
26
+ - Response: Airbnb's 'Description of Business' is operating a global platform for unique stays and experiences, connecting hosts and guests online or through mobile devices to book spaces and experiences around the world.
27
+ - LangSmith trace: https://smith.langchain.com/public/ebdf5473-64ac-4f85-81ab-bd3c3d624969/r
28
+
29
+ #### Question 2
30
+ - Question: What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
31
+ - Response: The total value of 'Cash and cash equivalents' as of December 31, 2023, is $2,369.
32
+ - LangSmith trace: https://smith.langchain.com/public/b0f93487-c729-4ccf-93f9-0354078282d8/r
33
+
34
+ #### Question 3
35
+ - Question: What is the 'maximum number of shares to be sold under the 10b5-1 Trading plan' by Brian Chesky?
36
+ - Response: The maximum number of shares to be sold under the 10b5-1 Trading plan by Brian Chesky is 1,146,000.
37
+ - LangSmith trace: https://smith.langchain.com/public/7fc4b549-2ea5-4b86-abf9-71d5e9a62738/r