Spaces:
Sleeping
Sleeping
File size: 10,193 Bytes
c88baef |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/samartht/miniconda3/envs/langchain/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def backtrace(trace: np.ndarray):\n"
]
}
],
"source": [
"import numpy as np \n",
"import whisper_app\n",
"from glob import glob \n",
"import llm_ops\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.docstore.document import Document\n",
"from langchain.embeddings import SentenceTransformerEmbeddings\n",
"from langchain.vectorstores import FAISS\n",
"from langchain.chains import RetrievalQA"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total number of audio files : 150\n"
]
}
],
"source": [
"audio_processor = whisper_app.WHISPERModel(device='cuda')\n",
"path = \"../../Conversational_GPT/data/Audio/*.mp3\"\n",
"\n",
"audio_files = glob(path)\n",
"print(\"total number of audio files :\",len(audio_files))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Hello. Hello, good morning sir. Good morning. My name is Sarvinder and I'm calling from PRIP for services sir, how are you? Okay. Sir, I believe you're in an inquiry regarding mortgage. Are you looking for a mortgage services sir? For service? Service? Like mortgage, are you looking for a mortgage? Marked mortgage, home loan, home loan. Home loan. Yes. No, no, no. I am talking with one, no, no, yesterday one lady called me from Arab Bank. Okay. I am making, I need the loan for home loan but by bank, Arab Bank. Yes. We can give you the bank sir, we can give you only from the bank only but we can give you the less interest rate if you want sir. I don't know, this is your company what? We are a PRIPCO company sir, we are tired with the multiple banks. Okay. We can tell you from which bank you can, what interest rate you get, you can get sir, what interest rate they are offering you sir. But I need to ask you, who give you my number, that's all in the beginning. This lady, I have the name lady yesterday she called, what is the name of the lady sir? I have the name. Can you tell me the name? Stay with me, stay with me. Yeah, yeah, yeah. Hello, you are staying with me? Yes sir, it's Awan, the name, name Madame Sara. Yeah, she's working in agents which Awan real estate broker, I got from there, your number is from the broker. Okay, now I have a plan to buy one house, the budget near 500,000 dirhams. Okay. And she told me we need to go to take it from Arab Bank because Arab Bank is better for the payment of tax. It's not like that, what interest they are giving you sir, that's what I want to know sir, how much they are offering you. I will tell you something, this is your number now? Yes. This is your number, what's up? I can send you high on WhatsApp sir. Send me my WhatsApp, I will send you which apartment I need, okay. And after we are selling the apartment, we give on the mortgage. No, no, no. Like loan? Yeah, loan, correct loan. So I need some few details, okay. So like, you know, are you residents in UAE or non-residents sir? Yeah, but one second, one second. I don't need to take loan. I need to buy one apartment, buy my bank or another bank. I need to buy this, okay, I have deposits from me, 10% cash, I can deposit, the balance, I need to take it from the bank. I don't need loan, personal loan. If you are taking from the bank, so you need mortgage only sir, that's called mortgage. So that's why I am saying for that, yeah, for that I need some details like, you know, are you residents in UAE or non-residents? Yes, yes, I have everything, all the papers, I can send these to you. What's your age sir? 56. 56, okay, that's good. Okay, and you know, are you a seller or you have your own company? No, no sir, I have 30,000 per month. 30,000 per month. Okay. And since how long you are working in the same company? Same company, 8 years. 8 years. Yes. 8 years, okay, 8 years. Okay, and how many credit cards you have? I have now 2 credit cards. 2 credit cards, okay. Yes. Okay, do you have any personal loan, auto loan? Already I take one loan from Mushrik Bank. Mushrik Bank, one loan. One loan only. Bank loan. Okay, how much you are paying monthly to the bank? 4,900, taxes. 4,900. Yes. Okay, and may I know what's your nationality? Lebanese, I am from Lebanon. Lebanese. Lebanese. Okay, and looking for an off plan or are you looking for ready to move in? For ready to move. Ready to move. Ready to move in. Yes. Okay. And how much loan you are looking for? Like how much you want from the bank? How much money? I have my plan now. I saw one room, yes, 500. I need from the bank 450, near. 450, yes. Okay, and this is going to be your first property in UAE? First property need to buy, yes. Okay, and when you are planning to buy this property, as soon as possible? As soon as possible, yes. Everything is okay, no problem. Okay. So what I will do, I will just send this details to the bank and the market, the person, the loan person will give you a call and he will give you the information. Okay. Okay, okay, great. Thank you so much. Thank you. Thank you. Thank you. Thank you so much. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.\n"
]
}
],
"source": [
"text_data = audio_processor.speech_to_text(audio_path=audio_files[2])\n",
"print(text_data['text'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Laama Model "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"llm = llm_ops.get_model_from_hub(model_id='tiiuae/falcon-7b',api_key='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')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Document "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def process_documents(documents,data_chunk=1500,chunk_overlap=100):\n",
" text_splitter = CharacterTextSplitter(chunk_size=data_chunk, chunk_overlap=chunk_overlap,separator='\\n')\n",
" texts = text_splitter.split_documents(documents)\n",
" return texts\n",
"\n",
"metadata = {\"source\": f\"{audio_files[1]}\",\"duration\":text_data['duration'],\"language\":text_data['language']}\n",
"document = [Document(page_content=text_data['text'], metadata=metadata)]\n",
"splited_doc = process_documents(documents=document)\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"\n",
"def create_prompt():\n",
" prompt_template = \"\"\"You are a chatbot that answers questions regarding the conversation in given context . \n",
" Use the following context to answer in sentences and points. \n",
" If you don't know the answer, just say I don't know. \n",
"\n",
" {context}\n",
"\n",
" Question: {question}\n",
" Answer :\"\"\"\n",
" prompt = PromptTemplate(\n",
" template=prompt_template, input_variables=[\"context\", \"question\"]\n",
" )\n",
" return prompt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Embeddings "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
"embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large',model_kwargs={\"device\": device})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"vector_db = FAISS.from_documents(documents=splited_doc, embedding= embedding_model)\n",
"chain_type_kwargs = {\"prompt\": create_prompt()}\n",
"\n",
"qa = RetrievalQA.from_chain_type(llm=llm,\n",
" chain_type='stuff',\n",
" retriever=vector_db.as_retriever(),\n",
" chain_type_kwargs=chain_type_kwargs,\n",
" return_source_documents=True\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Q&A"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"How much loan customer is asking for\"\n",
"result = qa({\"query\": question})\n",
"matching_docs_score = vector_db.similarity_search_with_score(question)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"matching_docs_score"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "pytf",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
|