File size: 10,193 Bytes
c88baef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/samartht/miniconda3/envs/langchain/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
      "  def backtrace(trace: np.ndarray):\n"
     ]
    }
   ],
   "source": [
    "import numpy as np \n",
    "import whisper_app\n",
    "from glob import glob \n",
    "import llm_ops\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.docstore.document import Document\n",
    "from langchain.embeddings import SentenceTransformerEmbeddings\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.chains import RetrievalQA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total number of audio files : 150\n"
     ]
    }
   ],
   "source": [
    "audio_processor = whisper_app.WHISPERModel(device='cuda')\n",
    "path = \"../../Conversational_GPT/data/Audio/*.mp3\"\n",
    "\n",
    "audio_files = glob(path)\n",
    "print(\"total number of audio files :\",len(audio_files))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Hello. Hello, good morning sir. Good morning. My name is Sarvinder and I'm calling from PRIP for services sir, how are you? Okay. Sir, I believe you're in an inquiry regarding mortgage. Are you looking for a mortgage services sir? For service? Service? Like mortgage, are you looking for a mortgage? Marked mortgage, home loan, home loan. Home loan. Yes. No, no, no. I am talking with one, no, no, yesterday one lady called me from Arab Bank. Okay. I am making, I need the loan for home loan but by bank, Arab Bank. Yes. We can give you the bank sir, we can give you only from the bank only but we can give you the less interest rate if you want sir. I don't know, this is your company what? We are a PRIPCO company sir, we are tired with the multiple banks. Okay. We can tell you from which bank you can, what interest rate you get, you can get sir, what interest rate they are offering you sir. But I need to ask you, who give you my number, that's all in the beginning. This lady, I have the name lady yesterday she called, what is the name of the lady sir? I have the name. Can you tell me the name? Stay with me, stay with me. Yeah, yeah, yeah. Hello, you are staying with me? Yes sir, it's Awan, the name, name Madame Sara. Yeah, she's working in agents which Awan real estate broker, I got from there, your number is from the broker. Okay, now I have a plan to buy one house, the budget near 500,000 dirhams. Okay. And she told me we need to go to take it from Arab Bank because Arab Bank is better for the payment of tax. It's not like that, what interest they are giving you sir, that's what I want to know sir, how much they are offering you. I will tell you something, this is your number now? Yes. This is your number, what's up? I can send you high on WhatsApp sir. Send me my WhatsApp, I will send you which apartment I need, okay. And after we are selling the apartment, we give on the mortgage. No, no, no. Like loan? Yeah, loan, correct loan. So I need some few details, okay. So like, you know, are you residents in UAE or non-residents sir? Yeah, but one second, one second. I don't need to take loan. I need to buy one apartment, buy my bank or another bank. I need to buy this, okay, I have deposits from me, 10% cash, I can deposit, the balance, I need to take it from the bank. I don't need loan, personal loan. If you are taking from the bank, so you need mortgage only sir, that's called mortgage. So that's why I am saying for that, yeah, for that I need some details like, you know, are you residents in UAE or non-residents? Yes, yes, I have everything, all the papers, I can send these to you. What's your age sir? 56. 56, okay, that's good. Okay, and you know, are you a seller or you have your own company? No, no sir, I have 30,000 per month. 30,000 per month. Okay. And since how long you are working in the same company? Same company, 8 years. 8 years. Yes. 8 years, okay, 8 years. Okay, and how many credit cards you have? I have now 2 credit cards. 2 credit cards, okay. Yes. Okay, do you have any personal loan, auto loan? Already I take one loan from Mushrik Bank. Mushrik Bank, one loan. One loan only. Bank loan. Okay, how much you are paying monthly to the bank? 4,900, taxes. 4,900. Yes. Okay, and may I know what's your nationality? Lebanese, I am from Lebanon. Lebanese. Lebanese. Okay, and looking for an off plan or are you looking for ready to move in? For ready to move. Ready to move. Ready to move in. Yes. Okay. And how much loan you are looking for? Like how much you want from the bank? How much money? I have my plan now. I saw one room, yes, 500. I need from the bank 450, near. 450, yes. Okay, and this is going to be your first property in UAE? First property need to buy, yes. Okay, and when you are planning to buy this property, as soon as possible? As soon as possible, yes. Everything is okay, no problem. Okay. So what I will do, I will just send this details to the bank and the market, the person, the loan person will give you a call and he will give you the information. Okay. Okay, okay, great. Thank you so much. Thank you. Thank you. Thank you. Thank you so much. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.\n"
     ]
    }
   ],
   "source": [
    "text_data = audio_processor.speech_to_text(audio_path=audio_files[2])\n",
    "print(text_data['text'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Laama Model "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = llm_ops.get_model_from_hub(model_id='tiiuae/falcon-7b',api_key='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')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create Document "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def process_documents(documents,data_chunk=1500,chunk_overlap=100):\n",
    "    text_splitter = CharacterTextSplitter(chunk_size=data_chunk, chunk_overlap=chunk_overlap,separator='\\n')\n",
    "    texts = text_splitter.split_documents(documents)\n",
    "    return texts\n",
    "\n",
    "metadata = {\"source\": f\"{audio_files[1]}\",\"duration\":text_data['duration'],\"language\":text_data['language']}\n",
    "document = [Document(page_content=text_data['text'], metadata=metadata)]\n",
    "splited_doc = process_documents(documents=document)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "def create_prompt():\n",
    "    prompt_template = \"\"\"You are a chatbot that answers questions regarding the conversation in given context . \n",
    "    Use the following context to answer in sentences and points. \n",
    "    If you don't know the answer, just say I don't know. \n",
    "\n",
    "    {context}\n",
    "\n",
    "    Question: {question}\n",
    "    Answer :\"\"\"\n",
    "    prompt = PromptTemplate(\n",
    "        template=prompt_template, input_variables=[\"context\", \"question\"]\n",
    "    )\n",
    "    return prompt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Embeddings "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
    "embedding_model = SentenceTransformerEmbeddings(model_name='thenlper/gte-large',model_kwargs={\"device\": device})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_db = FAISS.from_documents(documents=splited_doc, embedding= embedding_model)\n",
    "chain_type_kwargs = {\"prompt\": create_prompt()}\n",
    "\n",
    "qa = RetrievalQA.from_chain_type(llm=llm,\n",
    "                                chain_type='stuff',\n",
    "                                retriever=vector_db.as_retriever(),\n",
    "                                chain_type_kwargs=chain_type_kwargs,\n",
    "                                return_source_documents=True\n",
    "                            )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q&A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"How much loan customer is asking for\"\n",
    "result = qa({\"query\": question})\n",
    "matching_docs_score = vector_db.similarity_search_with_score(question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "matching_docs_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pytf",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}