{ "cells": [ { "cell_type": "markdown", "id": "7eb11cd7-6200-45f9-822f-45b8b392c4bd", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "markdown", "id": "4ca90151-0b89-4870-8213-9d49cb68c555", "metadata": {}, "source": [ "## Config\n", "Set the tokens based on the numbers in [02-poe-token-count-exploration.ipynb](02-poe-token-count-exploration.ipynb). I like to give a little buffer in-case an explanation goes over." ] }, { "cell_type": "code", "execution_count": 1, "id": "5238c6e9-9425-4ced-a16a-998e775e7342", "metadata": {}, "outputs": [], "source": [ "INPUT_TOKENS = 300\n", "OUTPUT_TOKENS = 1600\n", "\n", "INPUT_DATASET = 'derek-thomas/labeled-multiple-choice-explained-mistral-tokenized'\n", "OUTPUT_DATASET = 'derek-thomas/labeled-multiple-choice-explained-mistral-results'\n", "BASE_MODEL = 'mistralai/Mistral-7B-Instruct-v0.3'" ] }, { "cell_type": "markdown", "id": "f4eca659-f11f-4d25-886a-9d7af4f38411", "metadata": {}, "source": [ "# Setup\n", "Here we create the pydantic models for each of our experiments. Note because of how you specify field names in pydantic, we need to use an `alias` and `populate_by_name`. Given that our `Final Answer` is always a letter between a-h we can use an enumeration." ] }, { "cell_type": "code", "execution_count": 2, "id": "c5367700-0e9d-435b-875a-02a73b292ade", "metadata": {}, "outputs": [], "source": [ "from pydantic import BaseModel, Field\n", "from typing import List\n", "from enum import Enum\n", "import json\n", "\n", "\n", "class FinalAnswerEnum(str, Enum):\n", " a = \"a\"\n", " b = \"b\"\n", " c = \"c\"\n", " d = \"d\"\n", " e = \"e\"\n", " f = \"f\"\n", " g = \"g\"\n", " h = \"h\"\n", "\n", "class RFAModel(BaseModel):\n", " reasoning: str = Field(..., alias=\"Reasoning\")\n", " final_answer: FinalAnswerEnum = Field(..., alias=\"Final Answer\")\n", "\n", " class Config:\n", " populate_by_name = True\n", " \n", "class FARModel(BaseModel):\n", " final_answer: FinalAnswerEnum = Field(..., alias=\"Final Answer\")\n", " reasoning: str = Field(..., alias=\"Reasoning\")\n", "\n", " class Config:\n", " populate_by_name = True\n", " \n", "class FAModel(BaseModel):\n", " final_answer: FinalAnswerEnum = Field(..., alias=\"Final Answer\")\n", "\n", " class Config:\n", " populate_by_name = True" ] }, { "cell_type": "markdown", "id": "7e0f51c0-c4f7-4299-9a24-a4a90d4a9f2a", "metadata": {}, "source": [ "We generated lots of experiments in [derek-thomas/labeled-multiple-choice-explained-mistral-tokenized](https://huggingface.co/datasets/derek-thomas/labeled-multiple-choice-explained-mistral-tokenized/viewer?row=0). Now we will aggregate everything we need in `experiments` for convenience." ] }, { "cell_type": "code", "execution_count": 3, "id": "5d0bd22f-293e-4c15-9dfe-8070553f42b5", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'derek-thomas/mistral-v03-poe-RFA-mistral,derek-thomas/mistral-v03-poe-FAR-mistral,derek-thomas/mistral-v03-poe-RFA-gpt3-5,derek-thomas/mistral-v03-poe-FAR-gpt3-5,derek-thomas/mistral-v03-poe-FA'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "experiments = {\n", " 'RFA-mistral': {\n", " 'pydantic': RFAModel,\n", " \"lora\": \"derek-thomas/mistral-v03-poe-RFA-mistral\",\n", " \"column\": 'user_prompt_RFA',\n", " },\n", " 'FAR-mistral': {\n", " 'pydantic': FARModel,\n", " \"lora\": \"derek-thomas/mistral-v03-poe-FAR-mistral\",\n", " \"column\": 'user_prompt_FAR',\n", " },\n", " 'RFA-gpt3-5': {\n", " 'pydantic': RFAModel,\n", " \"lora\": \"derek-thomas/mistral-v03-poe-RFA-gpt3-5\",\n", " \"column\": 'user_prompt_RFA',\n", " },\n", " 'FAR-gpt3-5': {\n", " 'pydantic': FARModel,\n", " \"lora\": \"derek-thomas/mistral-v03-poe-FAR-gpt3-5\",\n", " \"column\": 'user_prompt_FAR',\n", " },\n", " 'FA': {\n", " 'pydantic': FAModel,\n", " \"lora\": \"derek-thomas/mistral-v03-poe-FA\",\n", " \"column\": 'user_prompt_FA',\n", " },\n", " 'base': {\n", " 'pydantic': FAModel,\n", " \"lora\": None,\n", " \"column\": 'user_prompt_FA',\n", " },\n", "}\n", "\n", "LORAS_STRING = ','.join([v['lora'] for _, v in experiments.items() if v and v.get('lora') is not None])\n", "LORAS_STRING" ] }, { "cell_type": "code", "execution_count": 4, "id": "6f8826fb-76ea-464f-8146-262bda0b58bc", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "50dbecc676db4dc78dd1974d2f1a87dc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='
=2.4.2 to get ordering of json outputs\n", " \"env\": {\n", " \"LORA_ADAPTERS\": LORAS_STRING,\n", " \"MAX_BATCH_PREFILL_TOKENS\": str(20*INPUT_TOKENS),\n", " \"MAX_INPUT_TOKENS\": str(INPUT_TOKENS), \n", " \"MAX_TOTAL_TOKENS\": str(INPUT_TOKENS + OUTPUT_TOKENS), \n", " \"DISABLE_CUSTOM_KERNELS\": 'false',\n", " \"MODEL_ID\": \"/repository\"\n", " },\n", " }\n", " \n", " secrets = {\n", " \"HF_TOKEN\": get_token()\n", " }\n", " \n", " # Creating the inference endpoint\n", " endpoint = create_inference_endpoint(\n", " name=name,\n", " namespace=namespace,\n", " repository='mistralai/Mistral-7B-Instruct-v0.3',\n", " framework=\"pytorch\",\n", " accelerator=\"gpu\",\n", " instance_size=\"x1\",\n", " instance_type=\"nvidia-l4\",\n", " region=\"us-east-1\",\n", " vendor=\"aws\",\n", " min_replica=8,\n", " max_replica=8,\n", " task=\"text-generation\",\n", " custom_image=custom_image,\n", " secrets=secrets\n", " )\n", " \n", " print(\"Your model is ready to use!\")\n", " endpoint.wait()\n", " return endpoint" ] }, { "cell_type": "code", "execution_count": 8, "id": "000b907a-224d-4dbf-aa0d-e0dbee1b8787", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your model is ready to use!\n", "CPU times: user 21.1 ms, sys: 10 ms, total: 31.1 ms\n", "Wall time: 1.72 s\n" ] } ], "source": [ "%%time\n", "endpoint = get_my_endpoint()" ] }, { "cell_type": "markdown", "id": "5708f348-c11e-4b66-aeff-93f5ec08ab49", "metadata": {}, "source": [ "## Manual Evaluation\n", "Since we havent seen our models in use yet, its a good time to check them out!" ] }, { "cell_type": "markdown", "id": "328dd842-eaa4-470c-b761-7c403b453321", "metadata": {}, "source": [ "### Reasoning Final Answer\n", "In both mistral and gpt3-5 we should see the **Reasoning** first and then the **Final Answer** in the prompt and the responses." ] }, { "cell_type": "code", "execution_count": 9, "id": "47af6191-7765-4047-bd8f-64aadb08434e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[INST] Answer the Question and include your Reasoning and the Final Answer in a json like: {\"Reasoning: \"...\", \"Final Answer\": \"x\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\\nQuestion: What are busses used for?\\nAnswer Choices: (a) Protective shelter (b) Transporting humans (c) Help other species benefit (d) Transporting airplanes (e) A backbone (f) Communication (g) Safe operation (h) Safe driving[/INST]'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "key = 'RFA-mistral'\n", "user_prompt_RFA = df.iloc[0][experiments[key]['column']]\n", "user_prompt_RFA" ] }, { "cell_type": "code", "execution_count": 10, "id": "1f976218-f33c-4db3-9797-3935e121e6b2", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'{\"Reasoning\": \"Busses are primarily used for transporting people, so the correct answer is (b) Transporting humans. The other options are not related to the function of a bus.\", \"Final Answer\": \"b\"}'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = endpoint.client.text_generation(\n", " prompt=user_prompt_RFA,\n", " max_new_tokens=OUTPUT_TOKENS,\n", " adapter_id=experiments[key]['lora'],\n", " grammar={\"type\": \"json\", \"value\": experiments[key]['pydantic'].schema()},\n", ")\n", "response" ] }, { "cell_type": "code", "execution_count": 11, "id": "222e33b7-0158-44f8-8848-da5318e699b4", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'{\"Reasoning\": \"Busses are primarily used for transporting humans, especially in urban areas where public transportation is necessary. They provide a means of transportation for a large number of people at once, reducing traffic congestion and carbon emissions. Therefore, the correct answer is (b) transporting humans.\", \"Final Answer\": \"b\"}'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "key = 'RFA-gpt3-5'\n", "response = endpoint.client.text_generation(\n", " prompt=user_prompt_RFA,\n", " max_new_tokens=OUTPUT_TOKENS,\n", " adapter_id=experiments[key]['lora'],\n", " grammar={\"type\": \"json\", \"value\": experiments[key]['pydantic'].schema()},\n", ")\n", "response" ] }, { "cell_type": "markdown", "id": "138fd46d-896f-4b4b-90d3-d3fd7075f149", "metadata": {}, "source": [ "### Final Answer Reasoning \n", "In both mistral and gpt3-5 we should see the **Final Answer** first and then the **Reasoning** in the prompt and the responses." ] }, { "cell_type": "code", "execution_count": 12, "id": "ec3e574b-f63f-4513-a6ae-335136543a8c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[INST] Answer the Question and include your Final Answer and the Reasoning in a json like: {\"Final Answer\": \"x\", \"Reasoning: \"...\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\\nQuestion: What are busses used for?\\nAnswer Choices: (a) Protective shelter (b) Transporting humans (c) Help other species benefit (d) Transporting airplanes (e) A backbone (f) Communication (g) Safe operation (h) Safe driving[/INST]'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "key = 'FAR-gpt3-5'\n", "user_prompt_FAR = df.iloc[0][experiments[key]['column']]\n", "user_prompt_FAR" ] }, { "cell_type": "code", "execution_count": 13, "id": "24f30a15-5ec0-4f26-b32f-b4ccb429e6f9", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'{\"Final Answer\": \"b\", \"Reasoning\": \"Busses are primarily used for transporting humans, especially in urban areas, to facilitate commuting and travel. They are not used for protective shelter, helping other species benefit, transporting airplanes, serving as a backbone, communication, safe operation, or safe driving.\"}'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = endpoint.client.text_generation(\n", " prompt=user_prompt_FAR,\n", " max_new_tokens=575,\n", " adapter_id=experiments[key]['lora'],\n", " grammar={\"type\": \"json\", \"value\": experiments[key]['pydantic'].schema()},\n", ")\n", "response" ] }, { "cell_type": "code", "execution_count": 14, "id": "32536844-211d-4856-983c-d5787734d420", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'{\"Final Answer\": \"b\", \"Reasoning\": \"Busses are primarily used for transporting humans, especially in urban areas, to facilitate their travel from one place to another. The other options are incorrect because busses do not provide protective shelter, do not help other species benefit, are not used to transport airplanes, do not serve as a backbone, are not used for communication, and are not related to safe operation or driving.\"}'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "key = 'FAR-mistral'\n", "response = endpoint.client.text_generation(\n", " prompt=user_prompt_FAR,\n", " max_new_tokens=575,\n", " adapter_id=experiments[key]['lora'],\n", " grammar={\"type\": \"json\", \"value\": experiments[key]['pydantic'].schema()},\n", ")\n", "response" ] }, { "cell_type": "markdown", "id": "5482cf12-880c-4112-ba87-059a03a3f466", "metadata": {}, "source": [ "### Final Answer \n", "Here we should juse see the **Final Answer** and no **Reasoning**." ] }, { "cell_type": "code", "execution_count": 15, "id": "71a9f634-319c-40c2-8f66-18e282732320", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'[INST] Answer the Question and include your Final Answer in a json like: {\"Final Answer\": \"x\"} where x is a letter that corresponds to the answer choice which is a letter between a and h.\\nQuestion: What are busses used for?\\nAnswer Choices: (a) Protective shelter (b) Transporting humans (c) Help other species benefit (d) Transporting airplanes (e) A backbone (f) Communication (g) Safe operation (h) Safe driving[/INST]'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "key = 'FA'\n", "user_prompt_FA = df.iloc[0][experiments[key]['column']]\n", "user_prompt_FA" ] }, { "cell_type": "code", "execution_count": 16, "id": "1cded37d-b907-4f4d-9b8c-c2167c6ba213", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "\"{'Final Answer': 'b'}\"" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "response = endpoint.client.text_generation(\n", " prompt=user_prompt_FA,\n", " max_new_tokens=575,\n", " adapter_id=experiments[key]['lora'],\n", ")\n", "response" ] }, { "cell_type": "markdown", "id": "122b8563-3220-43ac-9e0d-ef84ebcbb9e1", "metadata": {}, "source": [ "## Evaluation Loop\n", "I used 20x the prefill than the input and 8 replicas so I should capacity for ~160 parallel requests. Im only using 128 but it should be pretty fast." ] }, { "cell_type": "code", "execution_count": 17, "id": "fbc69ea7-e2ca-4405-82f0-5af95375ec88", "metadata": {}, "outputs": [], "source": [ "import nest_asyncio\n", "import asyncio\n", "from transformers import AutoTokenizer\n", "from tqdm.auto import tqdm\n", "\n", "# Allow nested event loops in Jupyter\n", "nest_asyncio.apply()\n", "\n", "\n", "# Semaphore to limit concurrency\n", "CONCURRENCY_LIMIT = 100 \n", "MAX_NEW_TOKENS = OUTPUT_TOKENS\n", "semaphore = asyncio.Semaphore(CONCURRENCY_LIMIT)\n", "\n", "# Progress bar\n", "progress_bar = None # Global to allow updates from within async functions\n", "\n", "# Function to send asynchronous requests to the endpoint\n", "async def fetch_response_async(async_client, prompt, lora_id, pydantic_model):\n", " async with semaphore: # Limit the number of concurrent requests\n", " response = await async_client.text_generation(\n", " prompt=prompt,\n", " max_new_tokens=MAX_NEW_TOKENS,\n", " adapter_id=lora_id if lora_id else None,\n", " grammar={\"type\": \"json\", \"value\": pydantic_model.schema()}\n", " )\n", " progress_bar.update(1) # Update the progress bar when the request is complete\n", " return response\n", "\n", "# Function to process a single conversation type asynchronously\n", "async def process_conversation_type(conversation_type, model_info, df, tokenizer, async_client):\n", " response_column = f\"responses_{conversation_type.replace('-','_')}\"\n", " responses = [] # Temporary list to hold responses for the current conversation type\n", "\n", " tasks = []\n", " for _, item in df.iterrows():\n", " prompt = item.get(model_info[\"column\"])\n", " tasks.append(fetch_response_async(async_client, prompt, model_info[\"lora\"], model_info[\"pydantic\"]))\n", "\n", " # Wait for all tasks to complete\n", " responses = await asyncio.gather(*tasks)\n", "\n", " # If responses are strings, use them directly; otherwise, extract 'generated_text'\n", " try:\n", " df[response_column] = [resp[\"generated_text\"] for resp in responses]\n", " except TypeError: # Fallback in case responses are raw strings\n", " df[response_column] = responses\n", "\n", "# Main function to handle all conversation types\n", "async def main(df, models, tokenizer, async_client):\n", " global progress_bar\n", " total_requests = len(df) * len(models) # Total number of requests across all conversation types\n", " progress_bar = tqdm(total=total_requests, desc=\"Processing requests\")\n", "\n", " tasks = []\n", " for conversation_type, model_info in models.items():\n", " tasks.append(process_conversation_type(conversation_type, model_info, df, tokenizer, async_client))\n", " await asyncio.gather(*tasks)\n", "\n", " progress_bar.close() # Close the progress bar when done\n", "\n", "# Define parameters and run\n", "# await main(df, experiments, tokenizer, endpoint.async_client)" ] }, { "cell_type": "markdown", "id": "9662adde-4ea4-4254-a6be-c619ed9557c8", "metadata": {}, "source": [ "This is the same as above but with a couple nice features like time-out in case you run into any issues." ] }, { "cell_type": "code", "execution_count": 18, "id": "e2671c78-6cd1-4bb7-aa04-b695bfb02115", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "be0bd3e278ae4d90a161918772ee71e8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Processing requests: 0%| | 0/10098 [00:00= MAX_RETRIES:\n", " raise e # If we've exhausted retries, re-raise the error\n", " else:\n", " print(f\"Error: {e}. Retrying... ({retries}/{MAX_RETRIES})\")\n", " await asyncio.sleep(BACKOFF_TIME) # Wait before retrying\n", "\n", "# Function to process a single conversation type asynchronously\n", "async def process_conversation_type(conversation_type, model_info, df, tokenizer, async_client):\n", " response_column = f\"responses_{conversation_type.replace('-','_')}\"\n", " responses = [] # Temporary list to hold responses for the current conversation type\n", "\n", " tasks = []\n", " for _, item in df.iterrows():\n", " prompt = item.get(model_info[\"column\"])\n", " tasks.append(fetch_response_async(async_client, prompt, model_info[\"lora\"], model_info[\"pydantic\"]))\n", "\n", " # Wait for all tasks to complete\n", " responses = await asyncio.gather(*tasks)\n", "\n", " # If responses are strings, use them directly; otherwise, extract 'generated_text'\n", " try:\n", " df[response_column] = [resp[\"generated_text\"] for resp in responses]\n", " except TypeError: # Fallback in case responses are raw strings\n", " df[response_column] = responses\n", "\n", "# Main function to handle all conversation types\n", "async def main(df, models, tokenizer, async_client):\n", " global progress_bar\n", " total_requests = len(df) * len(models) # Total number of requests across all conversation types\n", " progress_bar = tqdm(total=total_requests, desc=\"Processing requests\")\n", "\n", " tasks = []\n", " for conversation_type, model_info in models.items():\n", " tasks.append(process_conversation_type(conversation_type, model_info, df, tokenizer, async_client))\n", " await asyncio.gather(*tasks)\n", "\n", " progress_bar.close() # Close the progress bar when done\n", "\n", "# Define parameters and run\n", "await main(df, experiments, tokenizer, endpoint.async_client)\n" ] }, { "cell_type": "markdown", "id": "d562e0b4-96ae-4259-925d-4d62b8c49641", "metadata": {}, "source": [ "It took `00:10:43`. Not bad! That should be around `$1.14` total at `$0.80/gpu/hr`." ] }, { "cell_type": "code", "execution_count": 19, "id": "8f81466e-80fb-4915-9c68-dfbe168e052b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "InferenceEndpoint(name='prompt-order-experiment', namespace='derek-thomas', repository='mistralai/Mistral-7B-Instruct-v0.3', status='paused', url=None)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "endpoint.pause()" ] }, { "cell_type": "code", "execution_count": 20, "id": "f5a79dad-2475-4324-8a2b-77b33e9c0822", "metadata": {}, "outputs": [], "source": [ "df_backup = df.copy()" ] }, { "cell_type": "code", "execution_count": 21, "id": "3473d555-927a-49ab-8e15-9097ed455c48", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
topicquestion_textanswer_keygpt3_5_reasoningmistral_reasoninganswer_choicesuser_promptuser_prompt_RFAconversation_RFA_gpt3_5conversation_RFA_mistral...conversation_FAR_gpt3_5conversation_FAR_mistraluser_prompt_FAconversation_FAresponses_RFA_mistralresponses_FAR_mistralresponses_RFA_gpt3_5responses_FAR_gpt3_5responses_FAresponses_base
0TransportationWhat are busses used for?ba) Protective shelter: This option is incorrec...1. Start by reading the question carefully: \"C...(a) Protective shelter (b) Transporting humans...Question: What are busses used for?\\nAnswer Ch...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"Busses are primarily used for t...{\"Final Answer\": \"b\", \"Reasoning\": \"Busses are...{\"Reasoning\": \"Busses are primarily used for t...{\"Final Answer\": \"b\", \"Reasoning\": \"Busses are...{ \"Final Answer\": \"b\" }{\"Final Answer\": \"b\"}
1Climate changeWhich of the following does not contribute to ...ga) Nucleus of a cell: This option is not relat...To solve this question, let's first understand...(a) Nucleus of a cell (b) Flying in a plane (c...Question: Which of the following does not cont...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"Global warming is primarily cau...{\"Final Answer\": \"g\", \"Reasoning\": \"Riding a b...{\"Reasoning\": \"The nucleus of a cell (option a...{ \"Final Answer\": \"a\" ,\"Reasoning\": \"The nucle...{\"Final Answer\": \"a\"}{\"Final Answer\": \"a\"}
2PhotographyWhat uses electrical energy converted from che...ba) Sunlight: Sunlight is a form of energy that...1. Read the question and options carefully: Th...(a) Sunlight (b) Cameras (c) Cells (d) Buses (...Question: What uses electrical energy converte...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"The question asks for an organi...{\"Final Answer\": \"b\", \"Reasoning\": \"Cameras ar...{\"Reasoning\": \"The correct answer is (c) Cells...{\"Final Answer\": \"c\", \"Reasoning\": \"Cells are ...{\"Final Answer\": \"b\"}{\"Final Answer\": \"c\"}
3MicrobiologyBacteria causes what to be harmed?aNow, let's go through each option and explain ...To answer this question correctly, let's follo...(a) Plants (b) Electronics (c) Fossils (d) Hum...Question: Bacteria causes what to be harmed?\\n...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"The question asks about what is...{\"Final Answer\": \"d\", \"Reasoning\": \"Bacteria c...{\"Reasoning\": \"Bacteria are microorganisms tha...{\"Final Answer\": \"d\", \"Reasoning\": \"Bacteria c...{\"Final Answer\": \"e\"}{\"Final Answer\": \"d\"}
4BiologyPlants and snakes live _.?ab) Important habitats: This option is incorrec...1. Read the question and options carefully: Th...(a) Almost everywhere (b) Important habitats (...Question: Plants and snakes live _.?\\nAnswer C...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"The question asks for the livin...{\"Final Answer\": \"a\", \"Reasoning\": \"Plants and...{\"Reasoning\": \"The question asks about the liv...{ \"Final Answer\": \"a\" ,\"Reasoning\": \"Plants an...{\"Final Answer\": \"a\"}{\"Final Answer\": \"g\"}
..................................................................
1678BiologyNew resources required for creation can be red...ga) Mining: Mining involves extracting minerals...1. Start by reading the question and options c...(a) Mining (b) Mutations (c) Fossil fuels (d) ...Question: New resources required for creation ...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"The question asks for a way to ...{\"Final Answer\": \"g\", \"Reasoning\": \"Recycling ...{\"Reasoning\": \"Mining, fossil fuels, deforesta...{ \"Final Answer\": \"g\" ,\"Reasoning\": \"Recycling...{\"Final Answer\": \"g\"}{\"Final Answer\": \"g\"}
1679BiologyA drought dehydrates an entire what?da) Body water: This option is incorrect becaus...(a) Watershred - This is not a scientific term...(a) Body water (b) Dried fruit (c) Bodily wate...Question: A drought dehydrates an entire what?...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"The question asks for a term th...{\"Final Answer\": \"d\", \"Reasoning\": \"A drought ...{\"Reasoning\": \"A drought is a prolonged period...{ \"Final Answer\": \"d\" ,\"Reasoning\": \"A drought...{\"Final Answer\": \"a\"}{\"Final Answer\": \"d\"}
1680BiologyAn animal requires ingestion to do what?ea) Aerobic capacity: This option is not logica...1. Read the question and options carefully: \"W...(a) Aerobic capacity (b) Die (c) Water conserv...Question: An animal requires ingestion to do w...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"Ingestion is the process of tak...{\"Final Answer\": \"e\", \"Reasoning\": \"Ingestion ...{\"Reasoning\": \"Ingestion is the process of tak...{\"Final Answer\": \"e\", \"Reasoning\": \"Ingestion ...{\"Final Answer\": \"e\"}{\"Final Answer\": \"d\"}
1681BiologyUltraviolet light can cause what?ba) Ultraviolet light does not cause heat energ...1. First, read the question and options carefu...(a) Heat energy (b) Skin cancer (c) Killing in...Question: Ultraviolet light can cause what?\\nA...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"Ultraviolet (UV) light is a typ...{\"Final Answer\": \"b\", \"Reasoning\": \"Ultraviole...{\"Reasoning\": \"Ultraviolet (UV) light is a typ...{ \"Final Answer\": \"b\" ,\"Reasoning\": \"Ultraviol...{\"Final Answer\": \"b\"}{\"Final Answer\": \"b\"}
1682Physical activityWhat can increase a body's strength?ca) Four limbs: This option is not correct beca...(a) Communication: In this context, the questi...(a) Four limbs (b) Disease (c) Running (d) Bic...Question: What can increase a body's strength?...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...{\"Reasoning\": \"The correct answer is 'Exercise...{\"Final Answer\": \"c\", \"Reasoning\": \"Running is...{\"Reasoning\": \"A body's strength is primarily ...{\"Final Answer\": \"c\", \"Reasoning\": \"Running is...{\"Final Answer\": \"f\"}{\"Final Answer\": \"c\"}
\n", "

1683 rows × 21 columns

\n", "
" ], "text/plain": [ " topic question_text \\\n", "0 Transportation What are busses used for? \n", "1 Climate change Which of the following does not contribute to ... \n", "2 Photography What uses electrical energy converted from che... \n", "3 Microbiology Bacteria causes what to be harmed? \n", "4 Biology Plants and snakes live _.? \n", "... ... ... \n", "1678 Biology New resources required for creation can be red... \n", "1679 Biology A drought dehydrates an entire what? \n", "1680 Biology An animal requires ingestion to do what? \n", "1681 Biology Ultraviolet light can cause what? \n", "1682 Physical activity What can increase a body's strength? \n", "\n", " answer_key gpt3_5_reasoning \\\n", "0 b a) Protective shelter: This option is incorrec... \n", "1 g a) Nucleus of a cell: This option is not relat... \n", "2 b a) Sunlight: Sunlight is a form of energy that... \n", "3 a Now, let's go through each option and explain ... \n", "4 a b) Important habitats: This option is incorrec... \n", "... ... ... \n", "1678 g a) Mining: Mining involves extracting minerals... \n", "1679 d a) Body water: This option is incorrect becaus... \n", "1680 e a) Aerobic capacity: This option is not logica... \n", "1681 b a) Ultraviolet light does not cause heat energ... \n", "1682 c a) Four limbs: This option is not correct beca... \n", "\n", " mistral_reasoning \\\n", "0 1. Start by reading the question carefully: \"C... \n", "1 To solve this question, let's first understand... \n", "2 1. Read the question and options carefully: Th... \n", "3 To answer this question correctly, let's follo... \n", "4 1. Read the question and options carefully: Th... \n", "... ... \n", "1678 1. Start by reading the question and options c... \n", "1679 (a) Watershred - This is not a scientific term... \n", "1680 1. Read the question and options carefully: \"W... \n", "1681 1. First, read the question and options carefu... \n", "1682 (a) Communication: In this context, the questi... \n", "\n", " answer_choices \\\n", "0 (a) Protective shelter (b) Transporting humans... \n", "1 (a) Nucleus of a cell (b) Flying in a plane (c... \n", "2 (a) Sunlight (b) Cameras (c) Cells (d) Buses (... \n", "3 (a) Plants (b) Electronics (c) Fossils (d) Hum... \n", "4 (a) Almost everywhere (b) Important habitats (... \n", "... ... \n", "1678 (a) Mining (b) Mutations (c) Fossil fuels (d) ... \n", "1679 (a) Body water (b) Dried fruit (c) Bodily wate... \n", "1680 (a) Aerobic capacity (b) Die (c) Water conserv... \n", "1681 (a) Heat energy (b) Skin cancer (c) Killing in... \n", "1682 (a) Four limbs (b) Disease (c) Running (d) Bic... \n", "\n", " user_prompt \\\n", "0 Question: What are busses used for?\\nAnswer Ch... \n", "1 Question: Which of the following does not cont... \n", "2 Question: What uses electrical energy converte... \n", "3 Question: Bacteria causes what to be harmed?\\n... \n", "4 Question: Plants and snakes live _.?\\nAnswer C... \n", "... ... \n", "1678 Question: New resources required for creation ... \n", "1679 Question: A drought dehydrates an entire what?... \n", "1680 Question: An animal requires ingestion to do w... \n", "1681 Question: Ultraviolet light can cause what?\\nA... \n", "1682 Question: What can increase a body's strength?... \n", "\n", " user_prompt_RFA \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " conversation_RFA_gpt3_5 \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " conversation_RFA_mistral ... \\\n", "0 [INST] Answer the Question and include your... ... \n", "1 [INST] Answer the Question and include your... ... \n", "2 [INST] Answer the Question and include your... ... \n", "3 [INST] Answer the Question and include your... ... \n", "4 [INST] Answer the Question and include your... ... \n", "... ... ... \n", "1678 [INST] Answer the Question and include your... ... \n", "1679 [INST] Answer the Question and include your... ... \n", "1680 [INST] Answer the Question and include your... ... \n", "1681 [INST] Answer the Question and include your... ... \n", "1682 [INST] Answer the Question and include your... ... \n", "\n", " conversation_FAR_gpt3_5 \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " conversation_FAR_mistral \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " user_prompt_FA \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " conversation_FA \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " responses_RFA_mistral \\\n", "0 {\"Reasoning\": \"Busses are primarily used for t... \n", "1 {\"Reasoning\": \"Global warming is primarily cau... \n", "2 {\"Reasoning\": \"The question asks for an organi... \n", "3 {\"Reasoning\": \"The question asks about what is... \n", "4 {\"Reasoning\": \"The question asks for the livin... \n", "... ... \n", "1678 {\"Reasoning\": \"The question asks for a way to ... \n", "1679 {\"Reasoning\": \"The question asks for a term th... \n", "1680 {\"Reasoning\": \"Ingestion is the process of tak... \n", "1681 {\"Reasoning\": \"Ultraviolet (UV) light is a typ... \n", "1682 {\"Reasoning\": \"The correct answer is 'Exercise... \n", "\n", " responses_FAR_mistral \\\n", "0 {\"Final Answer\": \"b\", \"Reasoning\": \"Busses are... \n", "1 {\"Final Answer\": \"g\", \"Reasoning\": \"Riding a b... \n", "2 {\"Final Answer\": \"b\", \"Reasoning\": \"Cameras ar... \n", "3 {\"Final Answer\": \"d\", \"Reasoning\": \"Bacteria c... \n", "4 {\"Final Answer\": \"a\", \"Reasoning\": \"Plants and... \n", "... ... \n", "1678 {\"Final Answer\": \"g\", \"Reasoning\": \"Recycling ... \n", "1679 {\"Final Answer\": \"d\", \"Reasoning\": \"A drought ... \n", "1680 {\"Final Answer\": \"e\", \"Reasoning\": \"Ingestion ... \n", "1681 {\"Final Answer\": \"b\", \"Reasoning\": \"Ultraviole... \n", "1682 {\"Final Answer\": \"c\", \"Reasoning\": \"Running is... \n", "\n", " responses_RFA_gpt3_5 \\\n", "0 {\"Reasoning\": \"Busses are primarily used for t... \n", "1 {\"Reasoning\": \"The nucleus of a cell (option a... \n", "2 {\"Reasoning\": \"The correct answer is (c) Cells... \n", "3 {\"Reasoning\": \"Bacteria are microorganisms tha... \n", "4 {\"Reasoning\": \"The question asks about the liv... \n", "... ... \n", "1678 {\"Reasoning\": \"Mining, fossil fuels, deforesta... \n", "1679 {\"Reasoning\": \"A drought is a prolonged period... \n", "1680 {\"Reasoning\": \"Ingestion is the process of tak... \n", "1681 {\"Reasoning\": \"Ultraviolet (UV) light is a typ... \n", "1682 {\"Reasoning\": \"A body's strength is primarily ... \n", "\n", " responses_FAR_gpt3_5 \\\n", "0 {\"Final Answer\": \"b\", \"Reasoning\": \"Busses are... \n", "1 { \"Final Answer\": \"a\" ,\"Reasoning\": \"The nucle... \n", "2 {\"Final Answer\": \"c\", \"Reasoning\": \"Cells are ... \n", "3 {\"Final Answer\": \"d\", \"Reasoning\": \"Bacteria c... \n", "4 { \"Final Answer\": \"a\" ,\"Reasoning\": \"Plants an... \n", "... ... \n", "1678 { \"Final Answer\": \"g\" ,\"Reasoning\": \"Recycling... \n", "1679 { \"Final Answer\": \"d\" ,\"Reasoning\": \"A drought... \n", "1680 {\"Final Answer\": \"e\", \"Reasoning\": \"Ingestion ... \n", "1681 { \"Final Answer\": \"b\" ,\"Reasoning\": \"Ultraviol... \n", "1682 {\"Final Answer\": \"c\", \"Reasoning\": \"Running is... \n", "\n", " responses_FA responses_base \n", "0 { \"Final Answer\": \"b\" } {\"Final Answer\": \"b\"} \n", "1 {\"Final Answer\": \"a\"} {\"Final Answer\": \"a\"} \n", "2 {\"Final Answer\": \"b\"} {\"Final Answer\": \"c\"} \n", "3 {\"Final Answer\": \"e\"} {\"Final Answer\": \"d\"} \n", "4 {\"Final Answer\": \"a\"} {\"Final Answer\": \"g\"} \n", "... ... ... \n", "1678 {\"Final Answer\": \"g\"} {\"Final Answer\": \"g\"} \n", "1679 {\"Final Answer\": \"a\"} {\"Final Answer\": \"d\"} \n", "1680 {\"Final Answer\": \"e\"} {\"Final Answer\": \"d\"} \n", "1681 {\"Final Answer\": \"b\"} {\"Final Answer\": \"b\"} \n", "1682 {\"Final Answer\": \"f\"} {\"Final Answer\": \"c\"} \n", "\n", "[1683 rows x 21 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 26, "id": "8619f9f5-9fe4-433e-b524-51c2b12e8d12", "metadata": {}, "outputs": [], "source": [ "def extract_final_answer(response):\n", " try:\n", " answer = json.loads(response).get(\"Final Answer\")\n", " except:\n", " answer = 'x'\n", " return answer\n", "\n", "# Create new columns for predictions\n", "df['predictions_base'] = df['responses_base'].apply(extract_final_answer)\n", "df['predictions_FA'] = df['responses_FA'].apply(extract_final_answer)\n", "df['predictions_RFA_mistral'] = df['responses_RFA_mistral'].apply(extract_final_answer)\n", "df['predictions_FAR_mistral'] = df['responses_FAR_mistral'].apply(extract_final_answer)\n", "df['predictions_RFA_gpt3_5'] = df['responses_RFA_gpt3_5'].apply(extract_final_answer)\n", "df['predictions_FAR_gpt3_5'] = df['responses_FAR_gpt3_5'].apply(extract_final_answer)\n" ] }, { "cell_type": "code", "execution_count": 28, "id": "938cf2a3-2fed-42a3-82ec-a56cb0ea9f37", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Base: \t\t\t\t\t\t45.22%\n", "Final Answer: \t\t\t\t\t64.53%\n", "Reasoning and then the Final Answer (Mistral): \t55.02%\n", "Final Answer and then the Reasoning (Mistral): \t61.79%\n", "Reasoning and then the Final Answer (GPT-3.5): \t57.28%\n", "Final Answer and then the Reasoning (GPT-3.5): \t61.62%\n" ] } ], "source": [ "from sklearn.metrics import accuracy_score\n", "\n", "print(f\"Base: \\t\\t\\t\\t\\t\\t{round(accuracy_score(y_true=df['answer_key'], y_pred=df['predictions_base']) * 100, 2)}%\")\n", "print(f\"Final Answer: \\t\\t\\t\\t\\t{round(accuracy_score(y_true=df['answer_key'], y_pred=df['predictions_FA']) * 100, 2)}%\")\n", "print(f\"Reasoning and then the Final Answer (Mistral): \\t{round(accuracy_score(y_true=df['answer_key'], y_pred=df['predictions_RFA_mistral']) * 100, 2)}%\")\n", "print(f\"Final Answer and then the Reasoning (Mistral): \\t{round(accuracy_score(y_true=df['answer_key'], y_pred=df['predictions_FAR_mistral']) * 100, 2)}%\")\n", "print(f\"Reasoning and then the Final Answer (GPT-3.5): \\t{round(accuracy_score(y_true=df['answer_key'], y_pred=df['predictions_RFA_gpt3_5']) * 100, 2)}%\")\n", "print(f\"Final Answer and then the Reasoning (GPT-3.5): \\t{round(accuracy_score(y_true=df['answer_key'], y_pred=df['predictions_FAR_gpt3_5']) * 100, 2)}%\")" ] }, { "cell_type": "code", "execution_count": 29, "id": "83aae472-513b-43c3-9ee8-64d4cda775e0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
topicquestion_textanswer_keygpt3_5_reasoningmistral_reasoninganswer_choicesuser_promptuser_prompt_RFAconversation_RFA_gpt3_5conversation_RFA_mistral...responses_RFA_gpt3_5responses_FAR_gpt3_5responses_FAresponses_basepredictions_basepredictions_FApredictions_FAR_mistralpredictions_RFA_gpt3_5predictions_FAR_gpt3_5predictions_RFA_mistral
0TransportationWhat are busses used for?ba) Protective shelter: This option is incorrec...1. Start by reading the question carefully: \"C...(a) Protective shelter (b) Transporting humans...Question: What are busses used for?\\nAnswer Ch...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"Busses are primarily used for t...{\"Final Answer\": \"b\", \"Reasoning\": \"Busses are...{ \"Final Answer\": \"b\" }{\"Final Answer\": \"b\"}bbbbbb
1Climate changeWhich of the following does not contribute to ...ga) Nucleus of a cell: This option is not relat...To solve this question, let's first understand...(a) Nucleus of a cell (b) Flying in a plane (c...Question: Which of the following does not cont...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"The nucleus of a cell (option a...{ \"Final Answer\": \"a\" ,\"Reasoning\": \"The nucle...{\"Final Answer\": \"a\"}{\"Final Answer\": \"a\"}aagaag
2PhotographyWhat uses electrical energy converted from che...ba) Sunlight: Sunlight is a form of energy that...1. Read the question and options carefully: Th...(a) Sunlight (b) Cameras (c) Cells (d) Buses (...Question: What uses electrical energy converte...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"The correct answer is (c) Cells...{\"Final Answer\": \"c\", \"Reasoning\": \"Cells are ...{\"Final Answer\": \"b\"}{\"Final Answer\": \"c\"}cbbcce
3MicrobiologyBacteria causes what to be harmed?aNow, let's go through each option and explain ...To answer this question correctly, let's follo...(a) Plants (b) Electronics (c) Fossils (d) Hum...Question: Bacteria causes what to be harmed?\\n...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"Bacteria are microorganisms tha...{\"Final Answer\": \"d\", \"Reasoning\": \"Bacteria c...{\"Final Answer\": \"e\"}{\"Final Answer\": \"d\"}deddde
4BiologyPlants and snakes live _.?ab) Important habitats: This option is incorrec...1. Read the question and options carefully: Th...(a) Almost everywhere (b) Important habitats (...Question: Plants and snakes live _.?\\nAnswer C...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"The question asks about the liv...{ \"Final Answer\": \"a\" ,\"Reasoning\": \"Plants an...{\"Final Answer\": \"a\"}{\"Final Answer\": \"g\"}gaafab
..................................................................
1678BiologyNew resources required for creation can be red...ga) Mining: Mining involves extracting minerals...1. Start by reading the question and options c...(a) Mining (b) Mutations (c) Fossil fuels (d) ...Question: New resources required for creation ...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"Mining, fossil fuels, deforesta...{ \"Final Answer\": \"g\" ,\"Reasoning\": \"Recycling...{\"Final Answer\": \"g\"}{\"Final Answer\": \"g\"}gggggg
1679BiologyA drought dehydrates an entire what?da) Body water: This option is incorrect becaus...(a) Watershred - This is not a scientific term...(a) Body water (b) Dried fruit (c) Bodily wate...Question: A drought dehydrates an entire what?...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"A drought is a prolonged period...{ \"Final Answer\": \"d\" ,\"Reasoning\": \"A drought...{\"Final Answer\": \"a\"}{\"Final Answer\": \"d\"}dadddd
1680BiologyAn animal requires ingestion to do what?ea) Aerobic capacity: This option is not logica...1. Read the question and options carefully: \"W...(a) Aerobic capacity (b) Die (c) Water conserv...Question: An animal requires ingestion to do w...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"Ingestion is the process of tak...{\"Final Answer\": \"e\", \"Reasoning\": \"Ingestion ...{\"Final Answer\": \"e\"}{\"Final Answer\": \"d\"}deeeec
1681BiologyUltraviolet light can cause what?ba) Ultraviolet light does not cause heat energ...1. First, read the question and options carefu...(a) Heat energy (b) Skin cancer (c) Killing in...Question: Ultraviolet light can cause what?\\nA...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"Ultraviolet (UV) light is a typ...{ \"Final Answer\": \"b\" ,\"Reasoning\": \"Ultraviol...{\"Final Answer\": \"b\"}{\"Final Answer\": \"b\"}bbbbbb
1682Physical activityWhat can increase a body's strength?ca) Four limbs: This option is not correct beca...(a) Communication: In this context, the questi...(a) Four limbs (b) Disease (c) Running (d) Bic...Question: What can increase a body's strength?...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your...<s>[INST] Answer the Question and include your......{\"Reasoning\": \"A body's strength is primarily ...{\"Final Answer\": \"c\", \"Reasoning\": \"Running is...{\"Final Answer\": \"f\"}{\"Final Answer\": \"c\"}cfcdcc
\n", "

1683 rows × 27 columns

\n", "
" ], "text/plain": [ " topic question_text \\\n", "0 Transportation What are busses used for? \n", "1 Climate change Which of the following does not contribute to ... \n", "2 Photography What uses electrical energy converted from che... \n", "3 Microbiology Bacteria causes what to be harmed? \n", "4 Biology Plants and snakes live _.? \n", "... ... ... \n", "1678 Biology New resources required for creation can be red... \n", "1679 Biology A drought dehydrates an entire what? \n", "1680 Biology An animal requires ingestion to do what? \n", "1681 Biology Ultraviolet light can cause what? \n", "1682 Physical activity What can increase a body's strength? \n", "\n", " answer_key gpt3_5_reasoning \\\n", "0 b a) Protective shelter: This option is incorrec... \n", "1 g a) Nucleus of a cell: This option is not relat... \n", "2 b a) Sunlight: Sunlight is a form of energy that... \n", "3 a Now, let's go through each option and explain ... \n", "4 a b) Important habitats: This option is incorrec... \n", "... ... ... \n", "1678 g a) Mining: Mining involves extracting minerals... \n", "1679 d a) Body water: This option is incorrect becaus... \n", "1680 e a) Aerobic capacity: This option is not logica... \n", "1681 b a) Ultraviolet light does not cause heat energ... \n", "1682 c a) Four limbs: This option is not correct beca... \n", "\n", " mistral_reasoning \\\n", "0 1. Start by reading the question carefully: \"C... \n", "1 To solve this question, let's first understand... \n", "2 1. Read the question and options carefully: Th... \n", "3 To answer this question correctly, let's follo... \n", "4 1. Read the question and options carefully: Th... \n", "... ... \n", "1678 1. Start by reading the question and options c... \n", "1679 (a) Watershred - This is not a scientific term... \n", "1680 1. Read the question and options carefully: \"W... \n", "1681 1. First, read the question and options carefu... \n", "1682 (a) Communication: In this context, the questi... \n", "\n", " answer_choices \\\n", "0 (a) Protective shelter (b) Transporting humans... \n", "1 (a) Nucleus of a cell (b) Flying in a plane (c... \n", "2 (a) Sunlight (b) Cameras (c) Cells (d) Buses (... \n", "3 (a) Plants (b) Electronics (c) Fossils (d) Hum... \n", "4 (a) Almost everywhere (b) Important habitats (... \n", "... ... \n", "1678 (a) Mining (b) Mutations (c) Fossil fuels (d) ... \n", "1679 (a) Body water (b) Dried fruit (c) Bodily wate... \n", "1680 (a) Aerobic capacity (b) Die (c) Water conserv... \n", "1681 (a) Heat energy (b) Skin cancer (c) Killing in... \n", "1682 (a) Four limbs (b) Disease (c) Running (d) Bic... \n", "\n", " user_prompt \\\n", "0 Question: What are busses used for?\\nAnswer Ch... \n", "1 Question: Which of the following does not cont... \n", "2 Question: What uses electrical energy converte... \n", "3 Question: Bacteria causes what to be harmed?\\n... \n", "4 Question: Plants and snakes live _.?\\nAnswer C... \n", "... ... \n", "1678 Question: New resources required for creation ... \n", "1679 Question: A drought dehydrates an entire what?... \n", "1680 Question: An animal requires ingestion to do w... \n", "1681 Question: Ultraviolet light can cause what?\\nA... \n", "1682 Question: What can increase a body's strength?... \n", "\n", " user_prompt_RFA \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " conversation_RFA_gpt3_5 \\\n", "0 [INST] Answer the Question and include your... \n", "1 [INST] Answer the Question and include your... \n", "2 [INST] Answer the Question and include your... \n", "3 [INST] Answer the Question and include your... \n", "4 [INST] Answer the Question and include your... \n", "... ... \n", "1678 [INST] Answer the Question and include your... \n", "1679 [INST] Answer the Question and include your... \n", "1680 [INST] Answer the Question and include your... \n", "1681 [INST] Answer the Question and include your... \n", "1682 [INST] Answer the Question and include your... \n", "\n", " conversation_RFA_mistral ... \\\n", "0 [INST] Answer the Question and include your... ... \n", "1 [INST] Answer the Question and include your... ... \n", "2 [INST] Answer the Question and include your... ... \n", "3 [INST] Answer the Question and include your... ... \n", "4 [INST] Answer the Question and include your... ... \n", "... ... ... \n", "1678 [INST] Answer the Question and include your... ... \n", "1679 [INST] Answer the Question and include your... ... \n", "1680 [INST] Answer the Question and include your... ... \n", "1681 [INST] Answer the Question and include your... ... \n", "1682 [INST] Answer the Question and include your... ... \n", "\n", " responses_RFA_gpt3_5 \\\n", "0 {\"Reasoning\": \"Busses are primarily used for t... \n", "1 {\"Reasoning\": \"The nucleus of a cell (option a... \n", "2 {\"Reasoning\": \"The correct answer is (c) Cells... \n", "3 {\"Reasoning\": \"Bacteria are microorganisms tha... \n", "4 {\"Reasoning\": \"The question asks about the liv... \n", "... ... \n", "1678 {\"Reasoning\": \"Mining, fossil fuels, deforesta... \n", "1679 {\"Reasoning\": \"A drought is a prolonged period... \n", "1680 {\"Reasoning\": \"Ingestion is the process of tak... \n", "1681 {\"Reasoning\": \"Ultraviolet (UV) light is a typ... \n", "1682 {\"Reasoning\": \"A body's strength is primarily ... \n", "\n", " responses_FAR_gpt3_5 \\\n", "0 {\"Final Answer\": \"b\", \"Reasoning\": \"Busses are... \n", "1 { \"Final Answer\": \"a\" ,\"Reasoning\": \"The nucle... \n", "2 {\"Final Answer\": \"c\", \"Reasoning\": \"Cells are ... \n", "3 {\"Final Answer\": \"d\", \"Reasoning\": \"Bacteria c... \n", "4 { \"Final Answer\": \"a\" ,\"Reasoning\": \"Plants an... \n", "... ... \n", "1678 { \"Final Answer\": \"g\" ,\"Reasoning\": \"Recycling... \n", "1679 { \"Final Answer\": \"d\" ,\"Reasoning\": \"A drought... \n", "1680 {\"Final Answer\": \"e\", \"Reasoning\": \"Ingestion ... \n", "1681 { \"Final Answer\": \"b\" ,\"Reasoning\": \"Ultraviol... \n", "1682 {\"Final Answer\": \"c\", \"Reasoning\": \"Running is... \n", "\n", " responses_FA responses_base predictions_base \\\n", "0 { \"Final Answer\": \"b\" } {\"Final Answer\": \"b\"} b \n", "1 {\"Final Answer\": \"a\"} {\"Final Answer\": \"a\"} a \n", "2 {\"Final Answer\": \"b\"} {\"Final Answer\": \"c\"} c \n", "3 {\"Final Answer\": \"e\"} {\"Final Answer\": \"d\"} d \n", "4 {\"Final Answer\": \"a\"} {\"Final Answer\": \"g\"} g \n", "... ... ... ... \n", "1678 {\"Final Answer\": \"g\"} {\"Final Answer\": \"g\"} g \n", "1679 {\"Final Answer\": \"a\"} {\"Final Answer\": \"d\"} d \n", "1680 {\"Final Answer\": \"e\"} {\"Final Answer\": \"d\"} d \n", "1681 {\"Final Answer\": \"b\"} {\"Final Answer\": \"b\"} b \n", "1682 {\"Final Answer\": \"f\"} {\"Final Answer\": \"c\"} c \n", "\n", " predictions_FA predictions_FAR_mistral predictions_RFA_gpt3_5 \\\n", "0 b b b \n", "1 a g a \n", "2 b b c \n", "3 e d d \n", "4 a a f \n", "... ... ... ... \n", "1678 g g g \n", "1679 a d d \n", "1680 e e e \n", "1681 b b b \n", "1682 f c d \n", "\n", " predictions_FAR_gpt3_5 predictions_RFA_mistral \n", "0 b b \n", "1 a g \n", "2 c e \n", "3 d e \n", "4 a b \n", "... ... ... \n", "1678 g g \n", "1679 d d \n", "1680 e c \n", "1681 b b \n", "1682 c c \n", "\n", "[1683 rows x 27 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 30, "id": "45c08dd4-0b98-4e0f-b487-549f60518a4e", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "23d5dbd0a91d436fb9920dfe81e4803a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Uploading the dataset shards: 0%| | 0/1 [00:00