{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/prompt_with_vector_store.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# LLMs for Self-Study\n",
    "> A prompt and code template for better understanding texts\n",
    "\n",
    "This notebook provides a guide for using LLMs for self-study programmatically. A number of prompt templates are provided to assist with generating great assessments for self-study, and code is additionally provided for fast usage. This notebook is best leveraged for a set of documents (text or PDF preferred) **to be uploaded** for interaction with the model.\n",
    "\n",
    "This version of the notebook is best suited for those who prefer to use files from their local drive as context rather than copy and pasting directly into the notebook to be used as context for the model. If you prefer to copy and paste text, you should direct yourself to the [prompt_with_context](https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/prompt_with_context.ipynb) notebook."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "skip_exec: true\n",
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run this code if you're using Google Colab or don't have these packages installed in your computing environment\n",
    "! pip install pip install git+https://<token>@github.com/vanderbilt-data-science/lo-achievement.git"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#libraries for user setup code\n",
    "from getpass import getpass\n",
    "from logging import raiseExceptions\n",
    "\n",
    "#self import code\n",
    "from ai_classroom_suite.PromptInteractionBase import *\n",
    "from ai_classroom_suite.IOHelperUtilities import *\n",
    "from ai_classroom_suite.SelfStudyPrompts import *\n",
    "from ai_classroom_suite.MediaVectorStores import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# User Settings\n",
    "In this section, you'll set your OpenAI API Key (for use with the OpenAI model), configure your environment/files for upload, and upload those files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run this cell and enter your OpenAI API key when prompted\n",
    "set_openai_key()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create model\n",
    "mdl_name = 'gpt-3.5-turbo-16k'\n",
    "chat_llm = create_model(mdl_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Your Documents Source\n",
    "You may upload your files directly from your computer, or you may choose to do so via your Google Drive. Below, you will find instructions for both methods.\n",
    "\n",
    "For either model, begin by setting the `upload_setting` variable to:\n",
    "* `'Local Drive'` - if you have files that are on your own computer (locally), or\n",
    "* `'Google Drive'` - if you have files that are stored on Google Drive\n",
    "\n",
    "e.g.,\n",
    "`upload_setting='Google Drive'`.\n",
    "Don't forget the quotes around your selection!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Settings for upload: via local drive or Google Drive\n",
    "### Please input either \"Google Drive\" or \"Local Drive\" into the empty string\n",
    "\n",
    "#upload_setting = 'Google Drive'\n",
    "upload_setting = 'Local Drive'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style='color:green'><strong>Before Continuing</strong> - Make sure you have input your choice of upload into the `upload_setting`` variable above (Options: \"Local Drive\" or \"Google Drive\") as described in the above instructions.</p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upload your Files\n",
    "Now, you'll upload your files. When you run the below code cell, you'll be able to follow the instructions for local or Google Drive upload described here. If you would like to use our example document (Robert Frost's \"The Road Not Taken\", you can download the file from [this link](https://drive.google.com/drive/folders/1wpEoGACUqyNRYa4zBZeNkqcLJrGQbA53?usp=sharing) and upload via the instructions above.\n",
    "\n",
    "**If you selected **\"Local Drive\"** :**\n",
    "> If you selected Local Drive, you'll need to start by selecting your local files. Run the code cell below. Once the icon appears, click the \"Choose File\". This will direct you to your computer's local drive. Select the file you would like to upload as context. The files will appear in the right sidebar. Then follow the rest of the steps in the \"Uploading Your files (Local Drive and Google Drive)\" below.\n",
    "\n",
    "**If you selected **\"Google Drive\"**: **\n",
    "> If you selected Google Drive, you'll need to start by allowing access to your Google Drive. Run the code cell below. You will be redirected to a window where you will allow access to your Google Drive by logging into your Google Account. Your Drive will appear as a folder in the left side panel. Navigate through your Google Drive until you've found the file that you'd like to upload.\n",
    "\n",
    "Your files are now accessible to the code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e10a33b291a14f8089a1dea89f872998",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "FileChooser(path='/workspaces/lo-achievement', filename='', title='Use the following file chooser to add each …"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4649cff2ba0942aa9c5e073be85f40fb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Run this cell then following the instructions to upload your file\n",
    "selected_files = setup_drives(upload_setting)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['/workspaces/lo-achievement/roadnottaken.txt']"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "selected_files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Resource and Personal Tutor Creation\n",
    "Congratulations! You've nearly finished with the setup! From here, you can now run this section of cells using the arrow to the left to set up your vector store and create your model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a vector store with your document\n",
    "\n",
    "With the file path, you can now create a vector store using the document that you uploaded. We expose this creation in case you want to modify the kind of vector store that you're creating. Run the cell below to create the default provided vector store."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create vector store\n",
    "doc_segments = get_document_segments(selected_files, data_type = 'files')\n",
    "chroma_db, vs_retriever = create_local_vector_store(doc_segments, search_kwargs={\"k\": 1})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the model which will do the vector store lookup and tutoring"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create retrieval chain\n",
    "qa_chain = create_tutor_mdl_chain(kind=\"retrieval_qa\", retriever = vs_retriever)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A guide to prompting for self-study\n",
    "In this section, we provide a number of different approaches for using AI to help you assess and explain the knowledge of your document. Start by interacting with the model and then try out the rest of the prompts!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Brief overview of tutoring code options\n",
    "\n",
    "Now that your vector store is created, you can begin interacting with the model! You will interact with the model with a vector store using the `get_tutoring_answer` function below, and details are provided regarding the functionality below.\n",
    "\n",
    "Consider the multiple choice code snippet:\n",
    "```{python}\n",
    "tutor_q = get_tutoring_answer(context = '',\n",
    "                              qa_chain,\n",
    "                              assessment_request = SELF_STUDY_DEFAULTS['mc'],\n",
    "                              learning_objectives = learning_objs,\n",
    "                              input_kwargs = {'question':topic})\n",
    "```\n",
    "\n",
    "This is how we're able to interact with the model for tutoring when using vector stores. The parameters are as follows:\n",
    "\n",
    "* `context` will be an empty string or you can also set it to `None`. This is because this field is automatically populated using the vector store retreiver.\n",
    "* `qa_chain` is the model that you're using - we created this model chain a few cells above. \n",
    "* `assessment_request` is your way of telling the model what kind of assessment you want. In the example above, we use some defaults provided for multiple choice. You can also insert your own text here. To learn more about these defaults, see the `prompt_with_context.ipynb` in the CLAS repo.\n",
    "* `learning_objectives` are the learning objectives that you want to assess in a single paragraph string. You can set this to '' if you don't want to define any learning objectives. If you don't provide one, the model will use the default learning objectives.\n",
    "* `input_kwargs` are additional inputs that we can define in the prompts. Above, you see that the keyword `question` is defined. `question` is the text used to retrieve relevant texts from the vector store. Above, we define a custom topic. If you were to omit this parameter, the model would use `assessment_request` as the text to retrieve relevant documents from the vector store. See the examples below for both scenarios.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sample topics and learning objectives\n",
    "\n",
    "Below, we define a topic (used to retrieve documents from the vector store if provided) and learning objectives which will be used in the following examples. You can change these as needed for your purpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Code topic\n",
    "topic = 'The full text of the poem \"The Road Not Taken\" by Robert Frost'\n",
    "\n",
    "# set learning objectives if desired\n",
    "learning_objs = (\"\"\"1. Identify the key elements of the work: important takeaways and underlying message.\n",
    "                 2. Understand the literary devices used in prompting and in literature and their purpose.\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Types of Questions and Prompts\n",
    "\n",
    "Below is a comprehensive list of question types and prompt templates designed by our team. There are also example code blocks, where you can see how the model performed with the example and try it for yourself using the prompt template."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiple Choice\n",
    "\n",
    "Prompt: The following text should be used as the basis for the instructions which follow: {context}. Please design a 5 question quiz about {name or reference to context} which reflects the learning objectives: {list of learning objectives}. The questions should be multiple choice. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question 1: What is the underlying message of the excerpt?\n",
      "\n",
      "A) The speaker regrets not being able to travel both roads.\n",
      "B) The speaker believes that taking the less traveled road has made a significant impact on their life.\n",
      "C) The speaker is unsure about which road to choose.\n",
      "D) The speaker is fascinated by the beauty of the yellow wood.\n",
      "\n",
      "Please select one of the options (A, B, C, or D) and provide your answer.\n"
     ]
    }
   ],
   "source": [
    "# Multiple choice code example\n",
    "tutor_q = get_tutoring_answer('', qa_chain, assessment_request = SELF_STUDY_DEFAULTS['mc'],\n",
    "                              learning_objectives = learning_objs, input_kwargs = {'question':topic})\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Short Answer\n",
    "\n",
    "Prompt: Please design a 5-question quiz about {context} which reflects the learning objectives: {list of learning objectives}. The questions should be short answer. Expect the correct answers to be {anticipated length} long. If I get any part of the answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question 1: What is the underlying message of the poem?\n",
      "\n",
      "Remember to provide your answer in a few sentences.\n"
     ]
    }
   ],
   "source": [
    "# Short answer code example\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = SELF_STUDY_DEFAULTS['short_answer'],\n",
    "                              learning_objectives = learning_objs, input_kwargs = {'question':topic})\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fill-in-the-blank\n",
    "\n",
    "Prompt: Create a 5 question fill in the blank quiz refrencing {context}. The quiz should reflect the learning objectives: {learning objectives}. Please prompt me one question at a time and proceed when I answer correctly. If I answer incorrectly, please explain why my answer is incorrect.\n",
    "\n",
    ":::{.callout-info}\n",
    "In the example below, we omit the `input_kwargs` parameter. This means we'll use the text from `assessment_request` as the question topic.\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question: The speaker in the poem \"The Road Not Taken\" is faced with a choice between _______ roads.\n",
      "\n",
      "Please provide your answer.\n"
     ]
    }
   ],
   "source": [
    "# Fill in the blank code example\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = SELF_STUDY_DEFAULTS['fill_blank'],\n",
    "                              learning_objectives = learning_objs)\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sequencing\n",
    "\n",
    "Prompt: Please develop a 5 question questionnaire that will ask me to recall the steps involved in the following learning objectives in regard to {context}: {learning objectives}. After I respond, explain their sequence to me."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question 1: What is the underlying message or theme of the provided text?\n",
      "\n",
      "(Note: Please provide your response and I will evaluate it.)\n"
     ]
    }
   ],
   "source": [
    "# Sequence example\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = SELF_STUDY_DEFAULTS['sequencing'],\n",
    "                              learning_objectives = learning_objs)\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Relationships/drawing connections\n",
    "\n",
    "Prompt: Please design a 5 question quiz that asks me to explain the relationships that exist within the following learning objectives, referencing {context}: {learning objectives}."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question 1: What is the underlying message or theme of the text \"The Road Not Taken\"?\n",
      "\n",
      "(Note: The answer to this question will require the student to identify the key elements and important takeaways from the text in order to determine the underlying message or theme.)\n"
     ]
    }
   ],
   "source": [
    "# Relationships example\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = SELF_STUDY_DEFAULTS['relationships'],\n",
    "                              learning_objectives = learning_objs)\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Concepts and Definitions\n",
    "\n",
    "Prompt: Design a 5 question quiz that asks me about definitions related to the following learning objectives: {learning objectives} - based on {context}\".\n",
    "Once I write out my response, provide me with your own response, highlighting why my answer is correct or incorrect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question 1: Based on the provided text, what is the underlying message or theme of the work?\n",
      "\n",
      "Please provide your response.\n"
     ]
    }
   ],
   "source": [
    "# Concepts and definitions example\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = SELF_STUDY_DEFAULTS['concepts'],\n",
    "                              learning_objectives = learning_objs)\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Real Word Examples\n",
    "\n",
    "Prompt: Demonstrate how {context} can be applied to solve a real-world problem related to the following learning objectives: {learning objectives}. Ask me questions regarding this theory/concept."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the provided context, it seems that the extracted text is a poem by Robert Frost and does not directly provide any information or context related to problem-solving in the real world. Therefore, it may not be possible to demonstrate how the provided context can be applied to solve a real-world problem. However, I can still assess your understanding of the learning objectives mentioned. Let's start with the first learning objective: identifying the key elements of the work, important takeaways, and underlying message. \n",
      "\n",
      "Question 1: Based on your reading of the poem, what are some key elements or important takeaways that you can identify?\n"
     ]
    }
   ],
   "source": [
    "# Real word example\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = SELF_STUDY_DEFAULTS['real_world_example'],\n",
    "                              learning_objectives = learning_objs)\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Randomized Question Types\n",
    "\n",
    "Prompt: Please generate a high-quality assessment consisting of 5 varying questions, each of different types (open-ended, multiple choice, etc.), to determine if I achieved the following learning objectives in regards to {context}: {learning objectives}. If I answer incorrectly for any of the questions, please explain why my answer is incorrect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Question 1 (Open-ended):\n",
      "Based on the given excerpt, what do you think is the underlying message or theme of the text? Please provide a brief explanation to support your answer.\n",
      "\n",
      "(Note: The answer to this question will vary depending on the student's interpretation of the text. As the tutor, you can provide feedback on the strengths and weaknesses of their response, and guide them towards a deeper understanding of the text's message.)\n"
     ]
    }
   ],
   "source": [
    "# Randomized question types\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = SELF_STUDY_DEFAULTS['randomized_questions'],\n",
    "                              learning_objectives = learning_objs)\n",
    "\n",
    "print(tutor_q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Quantiative evaluation the correctness of a student's answer\n",
    "\n",
    "Prompt: (A continuation of the previous chat) Please generate the main points of the student’s answer to the previous question, and evaluate on a scale of 1 to 5 how comprehensive the student’s answer was in relation to the learning objectives, and explain why he or she received this rating, including what was missed in his or her answer if the student’s answer wasn’t complete.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Main points of the student's answer:\n",
      "- The underlying message of the text is that people should follow the crowd and take the easy way instead of the road less traveled.\n",
      "- The road less traveled is hard and painful to traverse.\n",
      "\n",
      "Evaluation of the student's answer:\n",
      "I would rate the student's answer a 2 out of 5 in terms of comprehensiveness in relation to the learning objectives. \n",
      "\n",
      "Explanation:\n",
      "The student correctly identifies that the underlying message of the text is related to choosing between two paths, but their interpretation of the message is not entirely accurate. The student suggests that the text encourages people to follow the crowd and take the easy way, which is not supported by the actual message of the poem. The poem actually suggests that taking the road less traveled can make a significant difference in one's life. The student also mentions that the road less traveled is hard and painful to traverse, which is not explicitly stated in the text. This interpretation may be influenced by the student's personal perspective rather than the actual content of the poem. Therefore, the student's answer is not complete and does not fully grasp the intended message of the text.\n"
     ]
    }
   ],
   "source": [
    "# qualitative evaluation\n",
    "qualitative_query = \"\"\" Please generate the main points of the student’s answer to the previous question,\n",
    " and evaluate on a scale of 1 to 5 how comprehensive the student’s answer was in relation to the learning objectives,\n",
    " and explain why he or she received this rating, including what was missed in his or her answer if the student’s answer wasn’t complete.\"\"\"\n",
    "\n",
    "last_answer = (\"TUTOR QUESTION: Question 1 (Open-ended): \" +\n",
    "               \"Based on the given excerpt, what do you think is the underlying message or theme of the text? Please provide a \" + \n",
    "               \"brief explanation to support your answer.\\n\" + \n",
    "               \"STUDENT ANSWER: The underlying message of the text is that people should follow the crowd and the road less traveled is hard \"+\n",
    "               \"and painful to traverse. Take the easy way instead. \")\n",
    "\n",
    "# Note that this uses the previous result and query in the context\n",
    "tutor_q = get_tutoring_answer(None, qa_chain, assessment_request = qualitative_query + '\\n' + last_answer,\n",
    "                              learning_objectives = learning_objs,\n",
    "                              input_kwargs = {'question':topic})\n",
    "\n",
    "print(tutor_q)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "python3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}