# Notebook for ingesting & processing Meta's 2023 10K

- Google Doc with [instructions](https://docs.google.com/forms/d/e/1FAIpQLSfRHORtHFiPUGCiYNt2NfapWtgUQWbv5V75kUPwUAkx20r9Eg/viewform)

## 1. Setup

In [2]:
import nest_asyncio

nest_asyncio.apply()

import logging
import sys
import os
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [3]:
import llama_index
# from llama_index.core import set_global_handler

# os.environ["WANDB_NOTEBOOK_NAME"] = "Process Meta 10K.ipynb"
# set_global_handler("wandb", run_args={"project": "meta-10k"})
# wandb_callback = llama_index.core.global_handler

In [14]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

GPT4_MODEL_NAME = "gpt-4-turbo-2024-04-09"
GPT35_MODEL_NAME = "gpt-3.5-turbo-1106"

gpt35_model = OpenAI(model=GPT35_MODEL_NAME, temperature=0.0)
gpt4_model = OpenAI(model=GPT4_MODEL_NAME, temperature=0.0)

Settings.llm = OpenAI(model=GPT35_MODEL_NAME)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

DEFAULT_QUESTION1 = "What was the total value of 'Cash and cash equivalents' as of December 31, 2023?"
DEFAULT_QUESTION2 = "Who are Meta's 'Directors' (i.e., members of the Board of Directors)?"

## 2. Ingest Document

In [5]:
from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# assumes that the env has the key LLAMA_CLOUD_API_KEY
parser = LlamaParse(
 result_type="markdown",
 verbose=True
)

file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
 input_dir="../data", 
 file_extractor=file_extractor
).load_data(num_workers=10)

print(len(documents))

Started parsing the file under job_id fa8037ef-65ef-40e0-a756-a09f00a05502
1


In [6]:
# let's check to make sure that this is a 10-K file
print(documents[0].text[:1000] + "...")

## SECURITIES AND EXCHANGE COMMISSION UNITED STATES Washington, D.C. 20549

## FORM 10-K

(Mark One)

☒ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934

For the fiscal year ended December or 31, 2023

☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934

For the transition period from to

Commission File Number: 001-35551

Meta Platforms, Inc. Meta

(Exact name of registrant as specified in its charter)

Delaware 20-1665019

(State or other jurisdiction of incorporation or organization) 1 Meta Way, Menlo Park, California 94025 (I.R.S. Employer Identification Number)

(Address of principal executive offices and Zip Code)

(650) 543-4800

(Registrant's telephone number, including area code)

Securities registered pursuant to Section 12(b) of the Act:

|Title of each class|Trading symbol(s)|Name of each exchange on which registered|
|---|---|---|
|Class A Common Stock, $0.000006 par value|META|The Nasdaq Stock Mark

## 3. Indexing 

In [7]:
from llama_index.core.node_parser import MarkdownElementNodeParser
from llama_index.core import VectorStoreIndex

node_parser = MarkdownElementNodeParser(
 llm=gpt35_model,
 num_workers=10
)

In [None]:
nodes = node_parser.get_nodes_from_documents(documents=documents)
base_nodes, objects = node_parser.get_nodes_and_objects(nodes=nodes)

In [9]:
recursive_index = VectorStoreIndex(nodes=base_nodes + objects)
raw_index = VectorStoreIndex.from_documents(documents=documents)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "

## 4. Raw Query Engine

In [35]:
from llama_index.postprocessor.cohere_rerank import CohereRerank
reranker = CohereRerank(top_n=6)

raw_query_engine = recursive_index.as_query_engine(
 llm=gpt4_model,
 similarity_top_k=15,
 node_postprocessors=[reranker],
 verbose = True
)

In [36]:
raw_response1 = recursive_query_engine.query(DEFAULT_QUESTION1)
raw_response2 = recursive_query_engine.query(DEFAULT_QUESTION2)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;2;11;159;203mRetrieval entering 7873de31-b332-4365-bd65-0a4b84f47da6: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;2;11;159;203mRetrieval entering 4e8c6d95-87fd-423a-84ca-06c2d21c427e: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;2;11;159;203mRetrieval entering c7ec1eec-24ee-4c3e-81ba-ca81d1a91389: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;2;11;159;203mRetrieval entering 

In [37]:
print("**********RAW**********")
print(f"raw_response1: {raw_response1}")
print(f"raw_response2: {raw_response2}")


**********RAW**********
raw_response1: The total value of 'Cash and cash equivalents' as of December 31, 2023, was $41,862 million.
raw_response2: The provided text does not specify the names or identities of the members of Meta Platforms, Inc.'s Board of Directors. Therefore, I cannot provide the names of the directors based on the information available.


## 5. Recursive Query Engine

In [49]:
recursive_query_engine = recursive_index.as_query_engine(
 llm=gpt4_model,
 similarity_top_k=15,
 node_postprocessors=[reranker],
 verbose = True
)

In [50]:
recursive_response1 = recursive_query_engine.query(DEFAULT_QUESTION1)
recursive_response2 = recursive_query_engine.query(DEFAULT_QUESTION2)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;2;11;159;203mRetrieval entering 7873de31-b332-4365-bd65-0a4b84f47da6: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;2;11;159;203mRetrieval entering 4e8c6d95-87fd-423a-84ca-06c2d21c427e: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;2;11;159;203mRetrieval entering c7ec1eec-24ee-4c3e-81ba-ca81d1a91389: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;2;11;159;203mRetrieval entering 

In [51]:
print("**********RECURSIVE**********")
print(f"recursive_response1: {recursive_response1}")
print(f"recursive_response2: {recursive_response2}")

**********RECURSIVE**********
recursive_response1: The total value of 'Cash and cash equivalents' as of December 31, 2023, was $41,862 million.
recursive_response2: The provided text does not specify the names or identities of the members of Meta Platforms, Inc.'s Board of Directors.


## 6. Subquestion Query Engine

In [32]:
from llama_index.core.indices.query.query_transform.base import (
 StepDecomposeQueryTransform,
)

# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(llm=gpt4_model, verbose=True)

In [64]:
from llama_index.core.query_engine import MultiStepQueryEngine
index_summary = "Used to answer questions about Meta's 10-K and financial performance during 2023"

subquestion_query_engine = MultiStepQueryEngine(
 query_engine=recursive_query_engine,
 query_transform=step_decompose_transform,
 index_summary=index_summary,
)

In [65]:
sq_response1 = subquestion_query_engine.query(DEFAULT_QUESTION1)
sq_response2 = subquestion_query_engine.query(DEFAULT_QUESTION2)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;33m> Current query: What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;5;200m> New query: What was the total value of 'Cash and cash equivalents' reported in Meta's 10-K for the year ending December 31, 2023?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;2;11;159;203mRetrieval entering 7873de31-b332-4365-bd65-0a4b84f47da6: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' reported in Meta's 10-K for the year endi

In [66]:
print("**********RECURSIVE**********")
print(f"sq_response1: {sq_response1}")
print(f"sq_response2: {sq_response2}")

**********RECURSIVE**********
sq_response1: The total value of 'Cash and cash equivalents' as of December 31, 2023, was $41,862 million.
sq_response2: I'm sorry, I cannot provide the information you requested.


## 7. Multi-Step Query Engine

In [42]:

from llama_index.core.indices.query.query_transform.base import (
 StepDecomposeQueryTransform,
)

# gpt-4
step_decompose_transform = StepDecomposeQueryTransform(llm=gpt4_model, verbose=True)


In [69]:
mstep_query_engine = MultiStepQueryEngine(
 query_engine=recursive_query_engine,
 query_transform=step_decompose_transform,
 index_summary=index_summary,
)

In [70]:
mstep_response1 = mstep_query_engine.query(DEFAULT_QUESTION1)
mstep_response2 = mstep_query_engine.query(DEFAULT_QUESTION2)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;33m> Current query: What was the total value of 'Cash and cash equivalents' as of December 31, 2023?
[0m[1;3;38;5;200m> New query: What was the total value of 'Cash and cash equivalents' for Meta as of December 31, 2023?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;2;11;159;203mRetrieval entering 7873de31-b332-4365-bd65-0a4b84f47da6: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What was the total value of 'Cash and cash equivalents' for Meta as of December 31, 2023?
[0m[1;3;38;2;11;159;203mRetrieval 

In [71]:
print("**********RECURSIVE**********")
print(f"mstep_response1: {mstep_response1}")
print(f"mstep_response2: {mstep_response2}")

**********RECURSIVE**********
mstep_response1: The total value of 'Cash and cash equivalents' as of December 31, 2023, was $41,862 million.
mstep_response2: I'm sorry, I cannot provide the information you requested.


In [73]:
print(subquestion_query_engine.query("How is the metaverse investment doing?"))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;33m> Current query: How is the metaverse investment doing?
[0m[1;3;38;5;200m> New query: What is the financial performance of Meta in 2023?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;2;11;159;203mRetrieval entering e61666fc-2ca4-4544-af68-66fe2b85b442: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query What is the financial performance of Meta in 2023?
[0m[1;3;38;2;11;159;203mRetrieval entering 580282f7-7c4e-4ba5-a559-636959b71ec8: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with q