Spaces:

HH-AI-Org
/

HH-azure-openai-poc

Paused

App Files Files Community

Change Liao commited on Jun 16, 2023

Commit

e37f8aa

1 Parent(s): 48178af

first push everything from an existing project

Browse files

Files changed (34) hide show

.aws/config +2 -0
.aws/credentials +3 -0
.gitattributes +1 -34
.gitignore +5 -0
README.md +94 -9
__pycache__/azure_utils.cpython-310.pyc +0 -0
__pycache__/polly_utils.cpython-310.pyc +0 -0
app.py +765 -0
azure_utils.py +155 -0
cache.sqlite3 +0 -0
data/audios/tempfile.mp3 +0 -0
data/ks_source/.gitattributes +1 -0
data/ks_source/110年07月MaaS交易資料-(例行).csv +3 -0
data/ks_source/110年08月MaaS交易資料-(例行).csv +3 -0
data/ks_source/110年09月MaaS交易資料-(例行).csv +3 -0
data/ks_source/110年10月MaaS交易資料-(例行).csv +3 -0
data/ks_source/110年11月MaaS交易資料-(例行).csv +3 -0
data/ks_source/110年12月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年01月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年02月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年03月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年04月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年05月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年06月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年07月MaaS交易資料-(例行).csv +3 -0
data/ks_source/111年08月MaaS交易資料-(例行).csv +3 -0
data/vector.png +0 -0
data/videos/Masahiro.mp4 +0 -0
data/videos/Masahiro1.mp4 +0 -0
data/videos/tempfile.mp4 +0 -0
poc_langchain.spec +50 -0
polly_utils.py +635 -0
requirements.txt +208 -0
run_local_server.bat +2 -0

.aws/config ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [default]
2	+ region=us-east-1

.aws/credentials ADDED Viewed

	@@ -0,0 +1,3 @@

+[default]
+aws_access_key_id = AKIAV7Q7AAGW54RBR6FZ
+aws_secret_access_key = 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

.gitattributes CHANGED Viewed

@@ -1,34 +1 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text


1	+ *.csv filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,5 @@

+# Default ignored files
+/.idea/
+venv/*
+data_source/*
+.git_foxconn/*

README.md CHANGED Viewed

@@ -1,13 +1,98 @@
 ---
-title: HH Azure Openai Poc
-emoji: 📉
-colorFrom: yellow
-colorTo: yellow
-sdk: gradio
-sdk_version: 3.35.2
 app_file: app.py
-pinned: false
-license: openrail
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: azure_openai_poc
 app_file: app.py
+sdk: gradio
+sdk_version: 3.34.0
 ---
+# azure_openai_poc
+## Getting started
+To make it easy for you to get started with GitLab, here's a list of recommended next steps.
+Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
+## Add your files
+- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
+- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
+```
+cd existing_repo
+git remote add origin https://devops.foxconn.com/16408/azure_openai_poc.git
+git branch -M main
+git push -uf origin main
+```
+## Integrate with your tools
+- [ ] [Set up project integrations](https://devops.foxconn.com/16408/azure_openai_poc/-/settings/integrations)
+## Collaborate with your team
+- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
+- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
+- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
+- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
+- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
+## Test and Deploy
+Use the built-in continuous integration in GitLab.
+- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
+- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
+- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
+- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
+- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
+***
+# Editing this README
+When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
+## Suggestions for a good README
+Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
+## Name
+Choose a self-explaining name for your project.
+## Description
+Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
+## Badges
+On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
+## Visuals
+Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
+## Installation
+Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
+## Usage
+Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
+## Support
+Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
+## Roadmap
+If you have ideas for releases in the future, it is a good idea to list them in the README.
+## Contributing
+State if you are open to contributions and what your requirements are for accepting them.
+For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
+You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
+## Authors and acknowledgment
+Show your appreciation to those who have contributed to the project.
+## License
+For open source projects, say how it is licensed.
+## Project status
+If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.

__pycache__/azure_utils.cpython-310.pyc ADDED Viewed

Binary file (3.07 kB). View file

__pycache__/polly_utils.cpython-310.pyc ADDED Viewed

Binary file (6.95 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,765 @@

+import os
+import datetime
+import glob
+import shutil
+import requests
+import io
+import sys
+import re
+import boto3
+from os import listdir
+from os.path import isfile, join
+import gradio
+from sqlitedict import SqliteDict
+import gradio as gr
+from langchain.llms import AzureOpenAI
+from langchain.chat_models import AzureChatOpenAI
+from langchain.embeddings.openai import OpenAIEmbeddings
+from langchain.chains import ConversationalRetrievalChain
+from langchain.memory import ChatMessageHistory
+from langchain import PromptTemplate
+from langchain.vectorstores import Chroma
+from langchain.text_splitter import CharacterTextSplitter
+from langchain.memory import ConversationBufferMemory
+from langchain.document_loaders import DirectoryLoader
+from langchain.document_loaders import UnstructuredFileLoader
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain.chains.summarize import load_summarize_chain
+import clickhouse_connect
+from pathlib import Path
+from langchain.document_loaders import YoutubeLoader
+from azure_utils import AzureVoiceData
+from polly_utils import PollyVoiceData, NEURAL_ENGINE
+from contextlib import closing
+#os env
+os.environ["OPENAI_API_TYPE"] = "azure"
+os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"
+os.environ["OPENAI_API_BASE"] = "https://civet-project-001.openai.azure.com/"
+os.environ["OPENAI_API_KEY"] = "0e3e5b666818488fa1b5cb4e4238ffa7"
+global_deployment_id = "CivetGPT"
+global_model_name = "gpt-35-turbo"
+#chroma settings
+chroma_api_impl = "HH_Azure_Openai"
+#root_file_path = "C:\\Users\\catsk\\SourceCode\\azure_openai_poc\\data\\"
+root_file_path = "./data/" #其實是data 存放的位置
+hr_source_path = "hr_source"
+ks_source_path = "ks_source"
+sqlite_name = "cache.sqlite3"
+sqlite_key="stored_files"
+persist_db = "persist_db"
+hr_collection_name = "hr_db"
+chroma_db_impl="localdb+langchain"
+tmp_collection="tmp_collection"
+#global text setting
+inputText = "問題(按q 或Ctrl + c跳出): "
+refuse_string="服務被拒. 內容可能涉及敏感字詞,政治,煽動他人或是其他不當言詞, 請改以其他內容嚐試"
+#video
+LOOPING_TALKING_HEAD = "./data/videos/Masahiro.mp4"
+TALKING_HEAD_WIDTH = "192"
+AZURE_VOICE_DATA = AzureVoiceData()
+POLLY_VOICE_DATA = PollyVoiceData()
+def save_sqlite(key,value):
+    try:
+        with SqliteDict(sqlite_name) as mydict:
+            old_value = mydict[key]
+            mydict[key] = value+old_value  # Using dict[key] to store
+            mydict.commit()  # Need to commit() to actually flush the data
+    except Exception as ex:
+        print("Error during storing data (Possibly unsupported):", ex)
+def load_sqlite(key):
+    try:
+        with SqliteDict(sqlite_name) as mydict:
+            value = mydict[key] # No need to use commit(), since we are only loading data!
+        return value
+    except Exception as ex:
+        print("Error during loading data:", ex)
+def delete_sql(key):
+    try:
+        with SqliteDict(sqlite_name) as mydict:
+            mydict[key] = []  # Using dict[key] to store
+            mydict.commit()  # Need to commit() to actually flush the data
+    except Exception as ex:
+        print("Error during storing data (Possibly unsupported):", ex)
+def ai_answer(answer):
+    print('AI 回答: \033[32m' + answer +'\033[0m')
+def get_openaiembeddings():
+    return OpenAIEmbeddings(
+        deployment="CivetGPT_embedding",
+        model="text-embedding-ada-002",
+        #embed_batch_size=1
+        chunk_size=1
+    )
+"""
+def get_chroma_client():
+    chroma_client = chromadb.Client(Settings(chroma_api_impl=chroma_api_impl,
+                                             chroma_server_host=chroma_db_ip,
+                                             chroma_server_http_port=chroma_db_port
+                                             ))
+    return chroma_client
+"""
+def multidocs_loader(files_path, file_ext):
+    full_files_pattern = "*." + file_ext
+    loader = DirectoryLoader(files_path, glob=full_files_pattern, show_progress=True)
+    data = loader.load()
+    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
+    documents = text_splitter.split_documents(data)
+    return documents
+def unstructure_file_loader(filename_path):
+    loader = UnstructuredFileLoader(filename_path)
+    data = loader.load()
+    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
+    documents = text_splitter.split_documents(data)
+    return documents
+def add_documents_into_cromadb(db_name, file_path, collection_name):
+    _db_name = db_name
+    documents = multidocs_loader(file_path,"*")
+    embeddings = get_openaiembeddings()
+    chroma_db = Chroma.from_documents(
+        documents,
+        embeddings,
+        collection_name=collection_name,
+        persist_directory=root_file_path+ persist_db,
+        chroma_db_impl=chroma_db_impl
+    )
+    chroma_db.persist()
+    print('adding documents done!')
+def initial_croma_db(db_name, files_path, file_ext, collection_name):
+    _db_name = db_name
+    documents = multidocs_loader(files_path, file_ext)
+    embeddings = get_openaiembeddings()
+    chroma_db = Chroma.from_documents(
+        documents,
+        embeddings,
+        collection_name = collection_name,
+        persist_directory= root_file_path+ persist_db,
+        chroma_db_impl=chroma_db_impl
+    )
+    chroma_db.persist()
+    print('vectorstore done!')
+def add_files_to_collection(input_file_path, collection_name):
+    file_path=root_file_path+input_file_path
+    add_documents_into_cromadb(persist_db, file_path, collection_name)
+def get_prompt_summary_string():
+    return """使用中文替下面內容做個精簡摘要:
+{text}
+精簡摘要:"""
+def get_prompt_template_string():
+    today = datetime.date.today().strftime("%Y年%m月%d日")
+    template_string = f"我是鴻海的員工, 你是一個超級助理. 今天是{today}".format(today=today)+"""
+請根據歷史對話,針對這次的問題, 形成獨立問題並以中文作回答. 請優先從提供的文件中尋找答案, 你被允許回答不知道, 但回答不知道時需要給中央人資的客服聯絡窗口資訊.
+不論什麼問題, 都以中文回答
+歷史對話: {chat_history}
+這次的問題: {question}
+超級助理:
+    """
+    return template_string
+def get_default_template_prompt():
+    template = "你是個知識廣泛的超級助手, 以下所有問題請用中文回答, 並請在500個中文字以內來解釋 {concept} 概念"
+    prompt = PromptTemplate(
+        input_variables = ["concept"],
+        template = template
+    )
+    return prompt
+def fine_tuning_model_chat(my_deployment_id, my_model_name):
+    _prompt = get_default_template_prompt()
+    llm = AzureOpenAI(model_name=my_model_name, deployment_name = my_deployment_id)
+    while 1:
+        text = input(inputText)
+        if text == 'q':
+            break
+        response = llm(_prompt.format(concept = text))
+        ai_answer(response)
+def chat_conversation():
+    print("resource: " + global_deployment_id + " / " + global_model_name)
+    chat = AzureChatOpenAI(
+        deployment_name = global_deployment_id,
+        model_name = global_model_name,
+    )
+    history = ChatMessageHistory()
+    history.add_ai_message("你是一個超級助理, 以下問題都用中文回答")
+    while 1:
+        text = input(inputText)
+        if text == 'q':
+            break
+        history.add_user_message(text)
+        ai_response = chat(history.messages)
+        ai_answer(ai_response.content)
+def local_vector_search(question_str,chat_history, collection_name = hr_collection_name):
+    embedding = get_openaiembeddings()
+    vectorstore = Chroma( embedding_function=embedding,
+                          collection_name=collection_name,
+                          persist_directory=root_file_path+persist_db,
+                          )
+    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, ai_prefix = "AI超級助理")
+    llm = AzureChatOpenAI(
+            deployment_name = global_deployment_id,
+            model_name= global_model_name,
+            temperature = 0.2)
+    prompt = PromptTemplate(
+        template=get_prompt_template_string(),
+        input_variables=["question","chat_history"]
+    )
+    prompt.format(question=question_str,chat_history=chat_history)
+    chain = ConversationalRetrievalChain.from_llm(
+        llm=llm,
+        retriever=vectorstore.as_retriever(),
+        memory=memory,
+        condense_question_prompt=prompt,
+    )
+    result = chain({"question": question_str, "chat_history":chat_history})
+    return result["answer"]
+def make_markdown_table(array):
+    nl = "\n"
+    markdown = ""
+    for entry in array:
+        markdown += f"{entry} {nl}"
+    return markdown
+def get_hr_files():
+    files = load_sqlite(sqlite_key)
+    if files == None:
+        return
+    else:
+        return make_markdown_table(files)
+def update_hr_km(files):
+    file_paths = [file.name for file in files]
+    dest_file_path=root_file_path+hr_source_path
+    if not os.path.exists(dest_file_path):
+        os.makedirs(dest_file_path)
+    for file in file_paths:
+        shutil.copy(file, dest_file_path)
+    add_files_to_collection(hr_source_path, hr_collection_name)
+    save_sqlite(sqlite_key, [Path(file_path).name for file_path in file_paths])
+    return get_hr_files()
+def clear_all_collection(collection_name):
+    pass
+def all_files_under_diretory(path):
+    files = glob.glob(path+'\*')
+    for f in files:
+        os.remove(f)
+def clear_hr_datas():
+    #remove hr collection
+    client = get_chroma_client(hr_collection_name)
+    client.delete_collection(name=hr_collection_name)
+    print("Collection removed completely!")
+    #remove files
+    all_files_under_diretory(root_file_path+hr_source_path)
+    delete_sql(sqlite_key)
+    return get_hr_files()
+def num_of_collection(collection_name):
+    client = get_chroma_client(collection_name)
+    number = client.get_collection(collection_name).count()
+    return f"目前知識卷裡有{number}卷項目"
+def clear_tmp_collection():
+    client = get_chroma_client(tmp_collection)
+    client.delete_collection(name=tmp_collection)
+    all_files_under_diretory(root_file_path+ks_source_path)
+    return num_of_collection(tmp_collection)
+def content_summary(split_documents):
+    llm = AzureChatOpenAI(
+        deployment_name=global_deployment_id,
+        model_name=global_model_name,
+        temperature=0.2)
+    map_prompt = get_prompt_summary_string()
+    map_prompt_template = PromptTemplate(template=map_prompt, input_variables=["text"])
+    chain = load_summarize_chain(
+        llm=llm,
+        chain_type="map_reduce",
+        verbose=True,
+        map_prompt=map_prompt_template,
+        combine_prompt=map_prompt_template
+    )
+    try:
+        output = chain({"input_documents": split_documents}, return_only_outputs=True)
+        return output
+    except Exception as e:
+        print(e)
+        return {'output_text':refuse_string}
+def pdf_summary(file_name):
+    print("file_name: "+file_name)
+    loader = UnstructuredFileLoader(file_name)
+    document = loader.load()
+    text_splitter = RecursiveCharacterTextSplitter(
+        chunk_size=1000,
+        chunk_overlap=20
+    )
+    split_documents = text_splitter.split_documents(document)
+    return content_summary(split_documents)
+def youtube_summary(youtube_url):
+    loader=YoutubeLoader.from_youtube_url(youtube_url, add_video_info=True, language=['en','zh-TW'], translation='zh-TW')
+    document=loader.load()
+    text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
+    split_documents=text_splitter.split_documents(document)
+    result = content_summary(split_documents)
+    return result['output_text']
+def summary_large_file(files):
+    file_paths = [file.name for file in files]
+    print(file_paths[0])
+    result = pdf_summary(file_paths[0])
+    return result["output_text"]
+def upload_large_file(files):
+    file_paths = [file.name for file in files]
+    return Path(file_paths[0]).stem
+def set_allow_lightweight_delete():
+    client = clickhouse_connect.get_client(host='127.0.0.1',port=8123)
+    command = "SET allow_experimental_lightweight_delete = true;"
+    #command = "show databases;"
+    res=client.command(command)
+    print(res)
+def get_chroma_client(collection_name):
+    vectorstore = Chroma(
+        embedding_function=get_openaiembeddings(),
+        collection_name=collection_name,
+        persist_directory= root_file_path+persist_db,
+    )
+    return vectorstore._client
+def create_db():
+    files_path = root_file_path+hr_source_path
+    file_ext = "pdf"
+    initial_croma_db(persist_db, files_path, file_ext, hr_collection_name)
+def generate_iframe_for_youtube(youtube_link):
+    regex = r"(?:https:\/\/)?(?:www\.)?(?:youtube\.com|youtu\.be)\/(?:watch\?v=)?(.+)"
+    _url=re.sub(regex, r"https://www.youtube.com/embed/\1", youtube_link)
+    embed_html = f'<iframe width="650" height="365" src="{_url}" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>'
+    print(embed_html)
+    return embed_html
+def create_html_video(file_name, width, temp_file_url):
+    html_video = f'<video width={width} height={width} autoplay muted loop><source src={temp_file_url} type="video/mp4" poster="Masahiro.png"></video>'
+    return html_video
+def do_html_audio_speak(words_to_speak):
+    polly_client = boto3.Session(
+        aws_access_key_id="AKIAV7Q7AAGW54RBR6FZ",
+        aws_secret_access_key="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",
+        region_name='us-west-2'
+    ).client('polly')
+    language_code="cmn-CN"
+    engine = NEURAL_ENGINE
+    voice_id = "Zhiyu"
+    print("voice_id: "+voice_id+"\nlanguage_code="+language_code)
+    response = polly_client.synthesize_speech(
+        Text=words_to_speak,
+        OutputFormat='mp3',
+        VoiceId=voice_id,
+        LanguageCode=language_code,
+        Engine=engine
+    )
+    html_audio = '<pre>no audio</pre>'
+    # Save the audio stream returned by Amazon Polly on Lambda's temp directory
+    if "AudioStream" in response:
+        with closing(response["AudioStream"]) as stream:
+            try:
+                with open('./data/audios/tempfile.mp3', 'wb') as f:
+                    f.write(stream.read())
+                temp_aud_file = gr.File("./data/audios/tempfile.mp3")
+                temp_aud_file_url = "/file=" + temp_aud_file.value['name']
+                html_audio = f'<audio autoplay><source src={temp_aud_file_url} type="audio/mp3"></audio>'
+            except IOError as error:
+                # Could not write to file, exit gracefully
+                print(error)
+                return None, None
+    else:
+        # The response didn't contain audio data, exit gracefully
+        print("Could not stream audio")
+        return None, None
+    return html_audio, "./data/audios/tempfile.mp3"
+def do_html_video_speak():
+    key = "eyJhbGciOiJIUzUxMiJ9.eyJ1c2VybmFtZSI6ImNhdHNreXR3QGdtYWlsLmNvbSJ9.OypOUZF-xv4-b8i9F4_aaMQiJpxv0mXRT5kyuJwTMXVd4awV-O-Obntp--AqGghNNowzQ9oG7zArSnQjz2vQgg"
+    url = "https://api.exh.ai/animations/v2/generate_lipsync_from_audio"
+    files = {"audio_file": ("./data/audios/tempfile.mp3", open("./data/audios/tempfile.mp3", "rb"), "audio/mpeg")}
+    payload = {
+        "animation_pipeline": "high_quality",
+        "idle_url": "https://ugc-idle.s3-us-west-2.amazonaws.com/5fd9ba1b1607b39a4d559300c1e35bee.mp4"
+    }
+    headers = {
+        "accept": "application/json",
+        "authorization": f"Bearer {key}"
+    }
+    res = requests.post(url, data=payload, files=files, headers=headers)
+    print("res.status_code: ", res.status_code)
+    html_video = '<pre>no video</pre>'
+    if isinstance(res.content, bytes):
+        response_stream = io.BytesIO(res.content)
+        print("len(res.content)): ", len(res.content))
+        with open('./data/videos/tempfile.mp4', 'wb') as f:
+            f.write(response_stream.read())
+        temp_file = gr.File("./data/videos/tempfile.mp4")
+        temp_file_url = "/file=" + temp_file.value['name']
+        html_video = f'<video width={TALKING_HEAD_WIDTH} height={TALKING_HEAD_WIDTH} autoplay><source src={temp_file_url} type="video/mp4" poster="Masahiro.png"></video>'
+    else:
+        print('video url unknown')
+    return res, html_video, "./data/videos/tempfile.mp4"
+def kh_update_km(files):
+    file_paths = [file.name for file in files]
+    dest_file_path = root_file_path + ks_source_path
+    if not os.path.exists(dest_file_path):
+        os.makedirs(dest_file_path)
+    for file in file_paths:
+        shutil.copy(file, dest_file_path)
+    add_files_to_collection(ks_source_path, tmp_collection)
+    return num_of_collection(tmp_collection)
+class Logger:
+    def __init__(self, filename):
+        self.terminal = sys.stdout
+        self.log = open(filename, "w", encoding='UTF-8')
+    def write(self, message):
+        self.terminal.write(message)
+        self.log.write(message)
+    def flush(self):
+        self.terminal.flush()
+        self.log.flush()
+    def isatty(self):
+        return False
+def read_logs():
+    sys.stdout.flush()
+    ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
+    with open("output.log", "r", encoding='UTF-8') as f:
+        return ansi_escape.sub('', f.read())
+def lunch_style(demo, logs=gr.Text()):
+    sys.stdout = Logger("output.log")
+    demo.load(read_logs, None, logs, every=1)
+    if len(sys.argv)==1:
+        print("running server as default value")
+        demo.launch(allowed_paths=[root_file_path, root_file_path+hr_source_path])
+    elif len(sys.argv)==2 and sys.argv[1] == "server":
+        local_ip = "10.40.23.232"
+        local_port = 7788
+        print(f"running server on http://{local_ip}:{local_port}")
+        demo.launch(allowed_paths=[root_file_path, root_file_path+hr_source_path],auth=("Foxconn", "Foxconn123!"),server_name=local_ip, server_port=local_port)
+    elif len(sys.argv)==4:
+        local_ip = sys.argv[2]
+        local_port = sys.argv[3]
+        print(f"running server on http://{local_ip}:{local_port}")
+        demo.launch(allowed_paths=[root_file_path, root_file_path+hr_source_path],auth=("Foxconn", "Foxconn123!"),server_name=local_ip, server_port=local_port)
+    else:
+        print("syntax: pythong <your_app>.py [server {ip_address, port}] ")
+def gradio_run():
+    print("User Login")
+    with gr.Blocks(theme='bethecloud/storj_theme') as demo:
+        with gr.Row():
+            gr.Markdown("# HH Azure Openai Demo")
+        #Header section
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("""
+### 這是一個基於各場景製造的Azure Openai Demo, 目前預計會包含場景有:
+- 超長文本的摘要 ☑
+- HR 智能客服小幫手 ☑
+- 上傳過去歷史資料, 預測未來發展
+- 上傳初步構想後, AI生成方案
+- 網路上搜尋各式資料(包含google, wikipedia, youtube) 等, 綜合分析給結論
+### 基礎的技術架構:
+* 給予資料, 持續累加
+* 存入vector(向量化) database, 依不同的collection 存放
+* 問題以相似度(Similarity search), 結果再丟給gpt 做綜合回應
+### 已知bug:
+* N/A
+如有任何Bug 歡迎隨時回饋
+            """)
+            with gr.Column(scale=1):
+                gr.Image(type="pil", value=root_file_path+"vector.png", label="技術概念圖")
+                gr.Markdown("""
+> 中央資訊 Change Liao(廖晨志)
+> teams/email: change.cc.liao@foxconn.com
+> 分機: 5010108
+                """)
+        with gr.Row():
+            gr.Markdown("""
+            ------
+            ## Playground
+            請切換下方Tab 鍵試驗各項功能
+            """)
+        #First PoC Section
+        with gr.Tab("文本摘要"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    gr.Markdown(f"""
+## 第一項實驗: 超長文本摘要
+請上傳任何文檔(.pdf, .doc, .csv, text 格式),上傳完成後稍等一會, AI 會在右側TextField 提供文本摘要
+* 使用方式:
+    * 請在右邊按下 `請上傳超長文本(可接受text, pdf, doc, csv 格式)` 上傳你的文本
+     * AI 會開始解析內容, 檔案愈大解析愈久
+    * 上傳完後可以按同個按鍵, 再次上傳
+    * 後續會支援video 以及 audio格式
+                """)
+                with gr.Column(scale=1):
+                    gr.Markdown("1.")
+                    file_name_field = gr.Textbox(max_lines=1, label="上傳檔案",placeholder="目前沒有上傳檔案")
+                    upload_button = gr.UploadButton("請上傳超長文本(可接受text, pdf, doc, csv 格式)",
+                                                    file_types=["text", ".pdf", ".doc", ".csv"], file_count="multiple")
+                    gr.Markdown("2.")
+                    summary_text = gr.Textbox()
+                    summary_text.label = "AI 摘要:"
+                    summary_text.change = False
+                    summary_text.lines = 12
+                    upload_button.upload(upload_large_file, upload_button, file_name_field).then(summary_large_file,upload_button,summary_text)
+        #2nd Hr Section
+        with gr.Tab("HR 客服助手"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    gr.Markdown(
+                    """
+                    ## 第二項實驗: HR 資料庫智能客服助手 AI 試驗
+                    """
+                    )
+                    gr.Markdown("""
+                    ### 使用方法
+                    * 測試人員可在下方加入任何HR 相關資料, 亦可全部刪除後上傳.
+                    * 系統會將資料向量化後,納入右方人資客服機器人資料庫
+                    * 測試人員可在右方與客服機器人對話
+                    (溫馨提醒: 儘可能所有檔案全部清掉, 再一次上傳所有想納入的檔案;且次數不要太多,以節省經費)
+                    """)
+                    file_list=gr.Textbox(get_hr_files, label="已存在知識庫的檔案(text,pdf,doc,csv)", placeholder="沒有任何檔案存在", max_lines=16, lines=16)
+                    with gr.Row():
+                        with gr.Column(scale=1):
+                            upload_button = gr.UploadButton("上傳HR知識庫檔案",
+                                                file_types=["text", ".pdf", ".doc", ".csv"], file_count="multiple")
+                            upload_button.upload(update_hr_km, inputs=upload_button, outputs=file_list)
+                        with gr.Column(scale=1):
+                            cleanDataBtn = gr.Button(value="刪除所有知識以及檔案")
+                            cleanDataBtn.click(clear_hr_datas,outputs=file_list)
+                with gr.Column(scale=1):
+                    with gr.Row():
+                        with gr.Column():
+                            tmp_file = gr.File(LOOPING_TALKING_HEAD, visible=False)
+                            tmp_file_url = "/file=" + tmp_file.value['name']
+                            htm_video = create_html_video(LOOPING_TALKING_HEAD, TALKING_HEAD_WIDTH, tmp_file_url)
+                            video_html = gr.HTML(htm_video)
+                            # my_aud_file = gr.File(label="Audio file", type="file", visible=True)
+                            tmp_aud_file = gr.File("./data/audios/tempfile.mp3", visible=False)
+                            tmp_aud_file_url = "/file=" + tmp_aud_file.value['name']
+                            htm_audio = f'<audio><source src={tmp_aud_file_url} type="audio/mp3"></audio>'
+                            audio_html = gr.HTML(htm_audio, visible=False)
+                            def respond(message, chat_history):
+                                vector_search_message = local_vector_search(message, chat_history)
+                                chat_history.append((message, vector_search_message))
+                                html_audio, audio_file_path = do_html_audio_speak(vector_search_message)
+                                res, new_html_video, video_file_path = do_html_video_speak()
+                                if res.status_code == 200:
+                                    return '', chat_history, new_html_video, ''
+                                else:
+                                    return '', chat_history, htm_video, html_audio
+                        with gr.Column():
+                            gr.Markdown("""
+                            ### AI 虛擬客服:
+                            * 這是一個實驗性質的AI 客服
+                            * 講話超過15秒就不會產生,正在要求放寬限制
+                            * 想要放誰的頭像都可以, 要放董事長也可以.
+                            * 訂閱��(有效時間 6/13~7/13)
+                            """)
+                    with gr.Row():
+                        chatbot = gr.Chatbot(value=[], elem_id="chatbot").style(height=400)
+                    with gr.Row():
+                        with gr.Column(scale=5):
+                            msg = gr.Textbox(
+                                show_label=False,
+                                placeholder="輸入你的問題",
+                            )
+                        with gr.Column(scale=1):
+                            clear = gr.Button("清除")
+                        msg.submit(respond, [msg, chatbot], [msg, chatbot, video_html, audio_html], queue=True)
+                        clear.click(lambda: None, None, chatbot, queue=False)
+        #3rd youtube
+        with gr.Tab("Youtube 影片摘要"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    youtube_gr = gr.HTML(generate_iframe_for_youtube("https://www.youtube.com/embed/"))
+                    youtube_link=gr.Textbox(interactive=True, label="在此貼上Youtube link:", placeholder="e.g. https://www.youtube.com/watch?v=xxxxxxxxx")
+                    youtube_link.change(generate_iframe_for_youtube,youtube_link,youtube_gr)
+                    youtube_analysis_btn=gr.Button("送出解析")
+                with gr.Column(scale=1):
+                    youtube_summary_textbox=gr.Textbox(interactive=False, label="AI 解析", lines=20)
+            youtube_analysis_btn.click(youtube_summary,youtube_link,youtube_summary_textbox)
+        with gr.Tab("高雄市政府票証"):
+            from langchain.agents import create_pandas_dataframe_agent
+            import pandas as pd
+            mypath = root_file_path+ks_source_path
+            onlyfiles = os.listdir(mypath)
+            df = pd.concat((pd.read_csv(os.path.join(mypath, filename)) for filename in onlyfiles))
+            with gr.Row():
+                gr.Markdown("""
+                ### 使用方式
+                這是一個使用高雄公車票證資料, 運用AI協助決策的工具.
+                如果有出現error, 請重新刷新頁面. 有error 就代表運算到最後token 數量超出azure openai 上限了, 這部份還在想辦法調整中.
+                """)
+                invField = gr.Textbox(visible=False)
+                gr.Examples(onlyfiles, label="資料庫檔案", inputs=invField, examples_per_page=4)
+            with gr.Row():
+                with gr.Column():
+                    davinci="text-davinci-003"
+                    llm = AzureOpenAI(
+                        deployment_name=davinci,
+                        model_name=davinci,
+                        max_tokens=2000,
+                        temperature=0,
+                    )
+                    agent=create_pandas_dataframe_agent(
+                        llm,
+                        df,
+                        max_iterations=30,
+                        return_intermediate_steps=False,
+                        verbose=True
+                    )
+                    def tmp_respond(prompt_str,message, chat_history):
+                        try:
+                            new_str=prompt_str.format(message=message, chat_history=chat_history)
+                            answer=agent.run(new_str)
+                            chat_history.append((message, answer))
+                        except Exception as e:
+                            response = str(e)
+                            print(f"Got error!{response}")
+                            if not response.startswith("Could not parse LLM output: `"):
+                                raise e
+                            answer = response.removeprefix("Could not parse LLM output: `").removesuffix("`")
+                            chat_history.append((message, answer))
+                        return '', chat_history
+                    tmp_chatbot = gr.Chatbot(value=[], elem_id="tmp_chatbot").style(height=500)
+                    with gr.Row():
+                        with gr.Column(scale=5):
+                            tmp_msg = gr.Textbox(
+                                        show_label=False,
+                                        placeholder="輸入你的問題",
+                                    )
+                        with gr.Column(scale=1):
+                                tmp_clear = gr.Button("清除對話")
+                with gr.Column():
+                    prompt_textbox=gr.Textbox("""
+你是一位專業的資料科學家,有下列定義:
+1.每個票卡序號代表一名乘客
+2.原始票價視為花費或是消費
+3.轉乘次數: 一名乘客在同一天有任意兩筆紀錄,其中一筆出下車站的資料等於另一筆進上車站的資料,其出下車站代表的車站的轉乘次數就要增加1.
+歷史訊息是 {chat_history}
+請以中文回答我下面的問題:{message}
+                    """, lines=10, label="Prompt:有{chat_history}及{message}, 請至少保留{message}變數", interactive=True, max_lines=10)
+                    console=gr.Textbox(lines=11, label="Console",max_lines=11)
+                tmp_msg.submit(tmp_respond, [prompt_textbox, tmp_msg, tmp_chatbot], [tmp_msg, tmp_chatbot], queue=True)
+                tmp_clear.click(lambda: None, None, tmp_chatbot, queue=False)
+            with gr.Row():
+                gr.Examples([
+                    '你有哪些業者?',
+                    '0001站轉乘旅客所佔比例',
+                    '高雄捷運的2022年7月份運輸量與2022年6月份相比, 增減如何?',
+                    '請給我2022年6月至2022年7月之間, 轉乘數量最高排名前五名的車站?',
+                    '0001站 在2022年9月份轉乘數量是未知. 請依2022年7月份到2022年8月份的趨勢, 請以月份做為時間單位, 做出一個數學模型. 用此數學模型來預測 0001站 在2022年9月份的轉乘數量會多少, 增減如何?'
+                ], label="訊息範例",inputs=tmp_msg)
+        demo.queue(concurrency_count=10)
+        lunch_style(demo,console)
+def test():
+    mypath = "C:\\Users\\catsk\\SourceCode\\azure_openai_poc\\data\\ks_source_files"
+    onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
+    print(onlyfiles)
+gradio_run()

azure_utils.py ADDED Viewed

	@@ -0,0 +1,155 @@

+# This class stores Azure voice data. Specifically, the class stores several records containing
+# language, lang_code, gender, voice_id and engine. The class also has a method to return the
+# voice_id, lang_code and engine given a language and gender.
+NEURAL_ENGINE = "neural"
+STANDARD_ENGINE = "standard"
+class AzureVoiceData:
+    def get_voice(self, language, gender):
+        for voice in self.voice_data:
+            if voice['language'] == language and voice['gender'] == gender:
+                return voice['azure_voice']
+        return None
+    def __init__(self):
+        self.voice_data = [
+            {'language': 'Arabic',
+             'azure_voice': 'ar-EG-ShakirNeural',
+             'gender': 'Male'},
+            {'language': 'Arabic (Gulf)',
+             'azure_voice': 'ar-KW-FahedNeural',
+             'gender': 'Male'},
+            {'language': 'Catalan',
+             'azure_voice': 'ca-ES-EnricNeural',
+             'gender': 'Male'},
+            {'language': 'Chinese (Cantonese)',
+             'azure_voice': 'yue-CN-YunSongNeural',
+             'gender': 'Male'},
+            {'language': 'Chinese (Mandarin)',
+             'azure_voice': 'zh-CN-YunxiNeural',
+             'gender': 'Male'},
+            {'language': 'Danish',
+             'azure_voice': 'da-DK-JeppeNeural',
+             'gender': 'Male'},
+            {'language': 'Dutch',
+             'azure_voice': 'nl-NL-MaartenNeural',
+             'gender': 'Male'},
+            {'language': 'English (Australian)',
+             'azure_voice': 'en-AU-KenNeural',
+             'gender': 'Male'},
+            {'language': 'English (British)',
+             'azure_voice': 'en-GB-RyanNeural',
+             'gender': 'Male'},
+            {'language': 'English (Indian)',
+             'azure_voice': 'en-IN-PrabhatNeural',
+             'gender': 'Male'},
+            {'language': 'English (New Zealand)',
+             'azure_voice': 'en-NZ-MitchellNeural',
+             'gender': 'Male'},
+            {'language': 'English (South African)',
+             'azure_voice': 'en-ZA-LukeNeural',
+             'gender': 'Male'},
+            {'language': 'English (US)',
+             'azure_voice': 'en-US-ChristopherNeural',
+             'gender': 'Male'},
+            {'language': 'English (Welsh)',
+             'azure_voice': 'cy-GB-AledNeural',
+             'gender': 'Male'},
+            {'language': 'Finnish',
+             'azure_voice': 'fi-FI-HarriNeural',
+             'gender': 'Male'},
+            {'language': 'French',
+             'azure_voice': 'fr-FR-HenriNeural',
+             'gender': 'Male'},
+            {'language': 'French (Canadian)',
+             'azure_voice': 'fr-CA-AntoineNeural',
+             'gender': 'Male'},
+            {'language': 'German',
+             'azure_voice': 'de-DE-KlausNeural',
+             'gender': 'Male'},
+            {'language': 'German (Austrian)',
+             'azure_voice': 'de-AT-JonasNeural',
+             'gender': 'Male'},
+            {'language': 'Hindi',
+             'azure_voice': 'hi-IN-MadhurNeural',
+             'gender': 'Male'},
+            {'language': 'Icelandic',
+             'azure_voice': 'is-IS-GunnarNeural',
+             'gender': 'Male'},
+            {'language': 'Italian',
+             'azure_voice': 'it-IT-GianniNeural',
+             'gender': 'Male'},
+            {'language': 'Japanese',
+             'azure_voice': 'ja-JP-KeitaNeural',
+             'gender': 'Male'},
+            {'language': 'Korean',
+             'azure_voice': 'ko-KR-GookMinNeural',
+             'gender': 'Male'},
+            {'language': 'Norwegian',
+             'azure_voice': 'nb-NO-FinnNeural',
+             'gender': 'Male'},
+            {'language': 'Polish',
+             'azure_voice': 'pl-PL-MarekNeural',
+             'gender': 'Male'},
+            {'language': 'Portuguese (Brazilian)',
+             'azure_voice': 'pt-BR-NicolauNeural',
+             'gender': 'Male'},
+            {'language': 'Portuguese (European)',
+             'azure_voice': 'pt-PT-DuarteNeural',
+             'gender': 'Male'},
+            {'language': 'Romanian',
+             'azure_voice': 'ro-RO-EmilNeural',
+             'gender': 'Male'},
+            {'language': 'Russian',
+             'azure_voice': 'ru-RU-DmitryNeural',
+             'gender': 'Male'},
+            {'language': 'Spanish (European)',
+             'azure_voice': 'es-ES-TeoNeural',
+             'gender': 'Male'},
+            {'language': 'Spanish (Mexican)',
+             'azure_voice': 'es-MX-LibertoNeural',
+             'gender': 'Male'},
+            {'language': 'Spanish (US)',
+             'azure_voice': 'es-US-AlonsoNeural"',
+             'gender': 'Male'},
+            {'language': 'Swedish',
+             'azure_voice': 'sv-SE-MattiasNeural',
+             'gender': 'Male'},
+            {'language': 'Turkish',
+             'azure_voice': 'tr-TR-AhmetNeural',
+             'gender': 'Male'},
+            {'language': 'Welsh',
+             'azure_voice': 'cy-GB-AledNeural',
+             'gender': 'Male'},
+        ]
+# Run from the command-line
+if __name__ == '__main__':
+    azure_voice_data = AzureVoiceData()
+    azure_voice = azure_voice_data.get_voice('English (US)', 'Male')
+    print('English (US)', 'Male', azure_voice)
+    azure_voice = azure_voice_data.get_voice('English (US)', 'Female')
+    print('English (US)', 'Female', azure_voice)
+    azure_voice = azure_voice_data.get_voice('French', 'Female')
+    print('French', 'Female', azure_voice)
+    azure_voice = azure_voice_data.get_voice('French', 'Male')
+    print('French', 'Male', azure_voice)
+    azure_voice = azure_voice_data.get_voice('Japanese', 'Female')
+    print('Japanese', 'Female', azure_voice)
+    azure_voice = azure_voice_data.get_voice('Japanese', 'Male')
+    print('Japanese', 'Male', azure_voice)
+    azure_voice = azure_voice_data.get_voice('Hindi', 'Female')
+    print('Hindi', 'Female', azure_voice)
+    azure_voice = azure_voice_data.get_voice('Hindi', 'Male')
+    print('Hindi', 'Male', azure_voice)

cache.sqlite3 ADDED Viewed

Binary file (12.3 kB). View file

data/audios/tempfile.mp3 ADDED Viewed

Binary file (18.9 kB). View file

data/ks_source/.gitattributes ADDED Viewed

	@@ -0,0 +1 @@


1	+ *.csv filter=lfs diff=lfs merge=lfs -text

data/ks_source/110年07月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ad310e9f1f660b48e44aa32150d3223f6706476cec0307aa7dd7d687ab945e9
+size 18657193

data/ks_source/110年08月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0dbc4041053d8ce0c36c90beaa75926c7e5cbe840db484898a44585c738fa10d
+size 33381982

data/ks_source/110年09月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:115c785bbba83a5c57ce32c0947489b26008900ff7782a9b0506011e79b35971
+size 85816562

data/ks_source/110年10月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6bed4234dd0e4ad1d568ed3036b3a26fa6186ccaeca3eb88bfa47d1d09d3d2f1
+size 108897445

data/ks_source/110年11月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6512839dbcee4308536315888a2007b96974d5a69fea59fd24bbd5318b0410c
+size 118424302

data/ks_source/110年12月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4245d9f05f2fa10993a1a715e5457102236023116f454009c50486465142a1a3
+size 119017044

data/ks_source/111年01月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:57acd362db6c7b86e8be0a18608a055215bc8cab83453263b529aaa8d82a96cc
+size 83616693

data/ks_source/111年02月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:932b5f84a88db8d3241762af24fd798782703c40e623414f6cade97bb3dfacc9
+size 64076063

data/ks_source/111年03月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cbf0fe2933fff2af651ced11aa7d8a730534c596c86c3277211874ccd5807892
+size 116143953

data/ks_source/111年04月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc59a8c8b91a0b64ee722f8356a5ff0e4e222205c1675a7d650dadbf4da5abd0
+size 96744856

data/ks_source/111年05月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e24f683e54b094e766365b4b07c4b0d6271393cde8f7b007fbf1a22a3526947
+size 74946870

data/ks_source/111年06月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ceeefd6c9709b7eae07e14861f98d9891037127226fd890dfb1ec17abb4496b4
+size 46411695

data/ks_source/111年07月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14723aaf372a8571ab146af1ca470cfa86e69f95dda07a694edb28941f80a004
+size 84705392

data/ks_source/111年08月MaaS交易資料-(例行).csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22f76a80c0e89354316ce8a9d461eef243b93f6dae4baa466fbd0679ff8cf94f
+size 94058824

data/vector.png ADDED Viewed

data/videos/Masahiro.mp4 ADDED Viewed

Binary file (228 kB). View file

data/videos/Masahiro1.mp4 ADDED Viewed

Binary file (228 kB). View file

data/videos/tempfile.mp4 ADDED Viewed

Binary file (88.3 kB). View file

poc_langchain.spec ADDED Viewed

	@@ -0,0 +1,50 @@

+# -*- mode: python ; coding: utf-8 -*-
+block_cipher = None
+a = Analysis(
+    ['poc_langchain.py'],
+    pathex=[],
+    binaries=[],
+    datas=[],
+    hiddenimports=[],
+    hookspath=[],
+    hooksconfig={},
+    runtime_hooks=[],
+    excludes=[],
+    win_no_prefer_redirects=False,
+    win_private_assemblies=False,
+    cipher=block_cipher,
+    noarchive=False,
+)
+pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
+exe = EXE(
+    pyz,
+    a.scripts,
+    [],
+    exclude_binaries=True,
+    name='poc_langchain',
+    debug=False,
+    bootloader_ignore_signals=False,
+    strip=False,
+    upx=True,
+    console=True,
+    disable_windowed_traceback=False,
+    argv_emulation=False,
+    target_arch=None,
+    codesign_identity=None,
+    entitlements_file=None,
+)
+coll = COLLECT(
+    exe,
+    a.binaries,
+    a.zipfiles,
+    a.datas,
+    strip=False,
+    upx=True,
+    upx_exclude=[],
+    name='poc_langchain',
+)

polly_utils.py ADDED Viewed

	@@ -0,0 +1,635 @@

+# This class stores Polly voice data. Specifically, the class stores several records containing
+# language, lang_code, gender, voice_id and engine. The class also has a method to return the
+# voice_id, lang_code and engine given a language and gender.
+NEURAL_ENGINE = "neural"
+STANDARD_ENGINE = "standard"
+class PollyVoiceData:
+    def get_voice(self, language, gender):
+        for voice in self.voice_data:
+            if voice['language'] == language and voice['gender'] == gender:
+                if voice['neural'] == 'Yes':
+                    return voice['voice_id'], voice['lang_code'], NEURAL_ENGINE
+        for voice in self.voice_data:
+            if voice['language'] == language and voice['gender'] == gender:
+                if voice['standard'] == 'Yes':
+                    return voice['voice_id'], voice['lang_code'], STANDARD_ENGINE
+        return None, None, None
+    def get_whisper_lang_code(self, language):
+        for voice in self.voice_data:
+            if voice['language'] == language:
+                return voice['whisper_lang_code']
+        return "en"
+    def __init__(self):
+        self.voice_data = [
+            {'language': 'Arabic',
+             'lang_code': 'arb',
+             'whisper_lang_code': 'ar',
+             'voice_id': 'Zeina',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Arabic (Gulf)',
+             'lang_code': 'ar-AE',
+             'whisper_lang_code': 'ar',
+             'voice_id': 'Hala',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Catalan',
+             'lang_code': 'ca-ES',
+             'whisper_lang_code': 'ca',
+             'voice_id': 'Arlet',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Chinese (Cantonese)',
+             'lang_code': 'yue-CN',
+             'whisper_lang_code': 'zh',
+             'voice_id': 'Hiujin',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Chinese (Mandarin)',
+             'lang_code': 'cmn-CN',
+             'whisper_lang_code': 'zh',
+             'voice_id': 'Zhiyu',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Danish',
+             'lang_code': 'da-DK',
+             'whisper_lang_code': 'da',
+             'voice_id': 'Naja',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Danish',
+             'lang_code': 'da-DK',
+             'whisper_lang_code': 'da',
+             'voice_id': 'Mads',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Dutch',
+             'lang_code': 'nl-NL',
+             'whisper_lang_code': 'nl',
+             'voice_id': 'Laura',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Dutch',
+             'lang_code': 'nl-NL',
+             'whisper_lang_code': 'nl',
+             'voice_id': 'Lotte',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Dutch',
+             'lang_code': 'nl-NL',
+             'whisper_lang_code': 'nl',
+             'voice_id': 'Ruben',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'English (Australian)',
+             'lang_code': 'en-AU',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Nicole',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'English (Australian)',
+             'lang_code': 'en-AU',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Olivia',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'English (Australian)',
+             'lang_code': 'en-AU',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Russell',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'English (British)',
+             'lang_code': 'en-GB',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Amy',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (British)',
+             'lang_code': 'en-GB',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Emma',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (British)',
+             'lang_code': 'en-GB',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Brian',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (British)',
+             'lang_code': 'en-GB',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Arthur',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'English (Indian)',
+             'lang_code': 'en-IN',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Aditi',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'English (Indian)',
+             'lang_code': 'en-IN',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Raveena',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'English (Indian)',
+             'lang_code': 'en-IN',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Kajal',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'English (New Zealand)',
+             'lang_code': 'en-NZ',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Aria',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'English (South African)',
+             'lang_code': 'en-ZA',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Ayanda',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Ivy',
+             'gender': 'Female (child)',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Joanna',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Kendra',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Kimberly',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Salli',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Joey',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Justin',
+             'gender': 'Male (child)',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Kevin',
+             'gender': 'Male (child)',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'English (US)',
+             'lang_code': 'en-US',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Matthew',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'English (Welsh)',
+             'lang_code': 'en-GB-WLS',
+             'whisper_lang_code': 'en',
+             'voice_id': 'Geraint',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Finnish',
+             'lang_code': 'fi-FI',
+             'whisper_lang_code': 'fi',
+             'voice_id': 'Suvi',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'French',
+             'lang_code': 'fr-FR',
+             'whisper_lang_code': 'fr',
+             'voice_id': 'Celine',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'French',
+             'lang_code': 'fr-FR',
+             'whisper_lang_code': 'fr',
+             'voice_id': 'Lea',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'French',
+             'lang_code': 'fr-FR',
+             'whisper_lang_code': 'fr',
+             'voice_id': 'Mathieu',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'French (Canadian)',
+             'lang_code': 'fr-CA',
+             'whisper_lang_code': 'fr',
+             'voice_id': 'Chantal',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'French (Canadian)',
+             'lang_code': 'fr-CA',
+             'whisper_lang_code': 'fr',
+             'voice_id': 'Gabrielle',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'French (Canadian)',
+             'lang_code': 'fr-CA',
+             'whisper_lang_code': 'fr',
+             'voice_id': 'Liam',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'German',
+             'lang_code': 'de-DE',
+             'whisper_lang_code': 'de',
+             'voice_id': 'Marlene',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'German',
+             'lang_code': 'de-DE',
+             'whisper_lang_code': 'de',
+             'voice_id': 'Vicki',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'German',
+             'lang_code': 'de-DE',
+             'whisper_lang_code': 'de',
+             'voice_id': 'Hans',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'German',
+             'lang_code': 'de-DE',
+             'whisper_lang_code': 'de',
+             'voice_id': 'Daniel',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'German (Austrian)',
+             'lang_code': 'de-AT',
+             'whisper_lang_code': 'de',
+             'voice_id': 'Hannah',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Hindi',
+             'lang_code': 'hi-IN',
+             'whisper_lang_code': 'hi',
+             'voice_id': 'Aditi',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Hindi',
+             'lang_code': 'hi-IN',
+             'whisper_lang_code': 'hi',
+             'voice_id': 'Kajal',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Icelandic',
+             'lang_code': 'is-IS',
+             'whisper_lang_code': 'is',
+             'voice_id': 'Dora',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Icelandic',
+             'lang_code': 'is-IS',
+             'whisper_lang_code': 'is',
+             'voice_id': 'Karl',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Italian',
+             'lang_code': 'it-IT',
+             'whisper_lang_code': 'it',
+             'voice_id': 'Carla',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Italian',
+             'lang_code': 'it-IT',
+             'whisper_lang_code': 'it',
+             'voice_id': 'Bianca',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Japanese',
+             'lang_code': 'ja-JP',
+             'whisper_lang_code': 'ja',
+             'voice_id': 'Mizuki',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Japanese',
+             'lang_code': 'ja-JP',
+             'whisper_lang_code': 'ja',
+             'voice_id': 'Takumi',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Korean',
+             'lang_code': 'ko-KR',
+             'whisper_lang_code': 'ko',
+             'voice_id': 'Seoyeon',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Norwegian',
+             'lang_code': 'nb-NO',
+             'whisper_lang_code': 'no',
+             'voice_id': 'Liv',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Norwegian',
+             'lang_code': 'nb-NO',
+             'whisper_lang_code': 'no',
+             'voice_id': 'Ida',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Polish',
+             'lang_code': 'pl-PL',
+             'whisper_lang_code': 'pl',
+             'voice_id': 'Ewa',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Polish',
+             'lang_code': 'pl-PL',
+             'whisper_lang_code': 'pl',
+             'voice_id': 'Maja',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Polish',
+             'lang_code': 'pl-PL',
+             'whisper_lang_code': 'pl',
+             'voice_id': 'Jacek',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Polish',
+             'lang_code': 'pl-PL',
+             'whisper_lang_code': 'pl',
+             'voice_id': 'Jan',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Polish',
+             'lang_code': 'pl-PL',
+             'whisper_lang_code': 'pl',
+             'voice_id': 'Ola',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Portuguese (Brazilian)',
+             'lang_code': 'pt-BR',
+             'whisper_lang_code': 'pt',
+             'voice_id': 'Camila',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Portuguese (Brazilian)',
+             'lang_code': 'pt-BR',
+             'whisper_lang_code': 'pt',
+             'voice_id': 'Vitoria',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Portuguese (Brazilian)',
+             'lang_code': 'pt-BR',
+             'whisper_lang_code': 'pt',
+             'voice_id': 'Ricardo',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Portuguese (European)',
+             'lang_code': 'pt-PT',
+             'whisper_lang_code': 'pt',
+             'voice_id': 'Ines',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Portuguese (European)',
+             'lang_code': 'pt-PT',
+             'whisper_lang_code': 'pt',
+             'voice_id': 'Cristiano',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Romanian',
+             'lang_code': 'ro-RO',
+             'whisper_lang_code': 'ro',
+             'voice_id': 'Carmen',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Russian',
+             'lang_code': 'ru-RU',
+             'whisper_lang_code': 'ru',
+             'voice_id': 'Tatyana',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Russian',
+             'lang_code': 'ru-RU',
+             'whisper_lang_code': 'ru',
+             'voice_id': 'Maxim',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Spanish (European)',
+             'lang_code': 'es-ES',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Conchita',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Spanish (European)',
+             'lang_code': 'es-ES',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Lucia',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Spanish (European)',
+             'lang_code': 'es-ES',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Enrique',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Spanish (Mexican)',
+             'lang_code': 'es-MX',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Mia',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Spanish (US)',
+             'lang_code': 'es-US',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Lupe',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'Yes'},
+            {'language': 'Spanish (US)',
+             'lang_code': 'es-US',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Penelope',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Spanish (US)',
+             'lang_code': 'es-US',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Miguel',
+             'gender': 'Male',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Spanish (US)',
+             'lang_code': 'es-US',
+             'whisper_lang_code': 'es',
+             'voice_id': 'Pedro',
+             'gender': 'Male',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Swedish',
+             'lang_code': 'sv-SE',
+             'whisper_lang_code': 'sv',
+             'voice_id': 'Astrid',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Swedish',
+             'lang_code': 'sv-SE',
+             'whisper_lang_code': 'sv',
+             'voice_id': 'Elin',
+             'gender': 'Female',
+             'neural': 'Yes',
+             'standard': 'No'},
+            {'language': 'Turkish',
+             'lang_code': 'tr-TR',
+             'whisper_lang_code': 'tr',
+             'voice_id': 'Filiz',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'},
+            {'language': 'Welsh',
+             'lang_code': 'cy-GB',
+             'whisper_lang_code': 'cy',
+             'voice_id': 'Gwyneth',
+             'gender': 'Female',
+             'neural': 'No',
+             'standard': 'Yes'}
+        ]
+# Run from the command-line
+if __name__ == '__main__':
+    polly_voice_data = PollyVoiceData()
+    voice_id, language_code, engine = polly_voice_data.get_voice('English (US)', 'Male')
+    print('English (US)', 'Male', voice_id, language_code, engine)
+    voice_id, language_code, engine = polly_voice_data.get_voice('English (US)', 'Female')
+    print('English (US)', 'Female', voice_id, language_code, engine)
+    voice_id, language_code, engine = polly_voice_data.get_voice('French', 'Female')
+    print('French', 'Female', voice_id, language_code, engine)
+    voice_id, language_code, engine = polly_voice_data.get_voice('French', 'Male')
+    print('French', 'Male', voice_id, language_code, engine)
+    voice_id, language_code, engine = polly_voice_data.get_voice('Japanese', 'Female')
+    print('Japanese', 'Female', voice_id, language_code, engine)
+    voice_id, language_code, engine = polly_voice_data.get_voice('Japanese', 'Male')
+    print('Japanese', 'Male', voice_id, language_code, engine)
+    voice_id, language_code, engine = polly_voice_data.get_voice('Hindi', 'Female')
+    print('Hindi', 'Female', voice_id, language_code, engine)
+    voice_id, language_code, engine = polly_voice_data.get_voice('Hindi', 'Male')
+    print('Hindi', 'Male', voice_id, language_code, engine)
+    whisper_lang_code = polly_voice_data.get_whisper_lang_code('English (US)')
+    print('English (US) whisper_lang_code:', whisper_lang_code)
+    whisper_lang_code = polly_voice_data.get_whisper_lang_code('Chinese (Mandarin)')
+    print('Chinese (Mandarin) whisper_lang_code:', whisper_lang_code)
+    whisper_lang_code = polly_voice_data.get_whisper_lang_code('Norwegian')
+    print('Norwegian whisper_lang_code:', whisper_lang_code)
+    whisper_lang_code = polly_voice_data.get_whisper_lang_code('Dutch')
+    print('Dutch whisper_lang_code:', whisper_lang_code)
+    whisper_lang_code = polly_voice_data.get_whisper_lang_code('Foo')
+    print('Foo whisper_lang_code:', whisper_lang_code)

requirements.txt ADDED Viewed

	@@ -0,0 +1,208 @@

+aiofiles==23.1.0
+aiohttp==3.8.4
+aiosignal==1.3.1
+altair==5.0.1
+anyio==3.7.0
+argilla==1.9.0
+argon2-cffi==21.3.0
+argon2-cffi-bindings==21.2.0
+arrow==1.2.3
+asttokens==2.2.1
+async-timeout==4.0.2
+attrs==23.1.0
+backcall==0.2.0
+backoff==2.2.1
+beautifulsoup4==4.12.2
+bleach==6.0.0
+boto3==1.26.152
+botocore==1.29.152
+bs4==0.0.1
+certifi==2023.5.7
+cffi==1.15.1
+chardet==5.1.0
+charset-normalizer==3.1.0
+chromadb==0.3.26
+click==8.1.3
+clickhouse-connect==0.6.2
+colorama==0.4.6
+coloredlogs==15.0.1
+comm==0.1.3
+commonmark==0.9.1
+contourpy==1.0.7
+cryptography==41.0.1
+cycler==0.11.0
+dataclasses-json==0.5.7
+debugpy==1.6.7
+decorator==5.1.1
+defusedxml==0.7.1
+Deprecated==1.2.14
+distlib==0.3.6
+duckdb==0.8.0
+et-xmlfile==1.1.0
+exceptiongroup==1.1.1
+executing==1.2.0
+fastapi==0.96.1
+fastjsonschema==2.17.1
+ffmpy==0.3.0
+filelock==3.12.0
+flatbuffers==23.5.26
+fonttools==4.39.4
+fqdn==1.5.1
+frozenlist==1.3.3
+fsspec==2023.6.0
+gradio==3.34.0
+gradio_client==0.2.6
+greenlet==2.0.2
+h11==0.14.0
+hnswlib==0.7.0
+httpcore==0.16.3
+httptools==0.5.0
+httpx==0.23.3
+huggingface-hub==0.15.1
+humanfriendly==10.0
+idna==3.4
+ipykernel==6.23.2
+ipython==8.14.0
+ipython-genutils==0.2.0
+isoduration==20.11.0
+jedi==0.18.2
+Jinja2==3.1.2
+jmespath==1.0.1
+joblib==1.2.0
+jsonpointer==2.3
+jsonschema==4.17.3
+jupyter-events==0.6.3
+jupyter_client==8.2.0
+jupyter_core==5.3.1
+jupyter_server==2.6.0
+jupyter_server_terminals==0.4.4
+jupyterlab-pygments==0.2.2
+kiwisolver==1.4.4
+langchain==0.0.200
+langchainplus-sdk==0.0.10
+linkify-it-py==2.0.2
+lxml==4.9.2
+lz4==4.3.2
+Markdown==3.4.3
+markdown-it-py==2.2.0
+MarkupSafe==2.1.3
+marshmallow==3.19.0
+marshmallow-enum==1.5.1
+matplotlib==3.7.1
+matplotlib-inline==0.1.6
+mdit-py-plugins==0.3.3
+mdurl==0.1.2
+mistune==2.0.5
+monotonic==1.6
+mpmath==1.3.0
+msg-parser==1.2.0
+multidict==6.0.4
+mypy-extensions==1.0.0
+nbclassic==1.0.0
+nbclient==0.8.0
+nbconvert==7.5.0
+nbformat==5.9.0
+nest-asyncio==1.5.6
+nltk==3.8.1
+notebook==6.5.4
+notebook_shim==0.2.3
+numexpr==2.8.4
+numpy==1.23.5
+olefile==0.46
+onnxruntime==1.15.0
+openai==0.27.8
+openapi-schema-pydantic==1.2.4
+openpyxl==3.1.2
+orjson==3.9.1
+overrides==7.3.1
+packaging==23.1
+pandas==1.5.3
+pandocfilters==1.5.0
+parso==0.8.3
+pdf2image==1.16.3
+pdfminer.six==20221105
+pickleshare==0.7.5
+Pillow==9.5.0
+pip-search==0.0.12
+platformdirs==3.5.1
+posthog==3.0.1
+prometheus-client==0.17.0
+prompt-toolkit==3.0.38
+protobuf==4.23.2
+psutil==5.9.5
+pulsar-client==3.2.0
+pure-eval==0.2.2
+pycparser==2.21
+pydantic==1.10.9
+pydub==0.25.1
+Pygments==2.15.1
+pypandoc==1.11
+pyparsing==3.0.9
+pyreadline3==3.4.1
+pyrsistent==0.19.3
+python-dateutil==2.8.2
+python-docx==0.8.11
+python-dotenv==1.0.0
+python-json-logger==2.0.7
+python-magic==0.4.27
+python-multipart==0.0.6
+python-pptx==0.6.21
+pytube==15.0.0
+pytz==2023.3
+pywin32==306
+pywinpty==2.0.10
+PyYAML==6.0
+pyzmq==25.1.0
+regex==2023.6.3
+requests==2.31.0
+rfc3339-validator==0.1.4
+rfc3986==1.5.0
+rfc3986-validator==0.1.1
+rich==13.0.1
+s3transfer==0.6.1
+scikit-learn==1.2.2
+scipy==1.10.1
+semantic-version==2.10.0
+Send2Trash==1.8.2
+six==1.16.0
+sklearn==0.0.post5
+sniffio==1.3.0
+soupsieve==2.4.1
+SQLAlchemy==2.0.16
+sqlitedict==2.1.0
+stack-data==0.6.2
+starlette==0.27.0
+sympy==1.12
+tabulate==0.9.0
+tenacity==8.2.2
+terminado==0.17.1
+threadpoolctl==3.1.0
+tiktoken==0.4.0
+tinycss2==1.2.1
+tokenizers==0.13.3
+toolz==0.12.0
+tornado==6.3.2
+tqdm==4.65.0
+traitlets==5.9.0
+typer==0.9.0
+typing-inspect==0.9.0
+typing_extensions==4.6.3
+tzdata==2023.3
+uc-micro-py==1.0.2
+unstructured==0.7.3
+uri-template==1.2.0
+urllib3==2.0.3
+uvicorn==0.22.0
+virtualenv==20.23.0
+watchfiles==0.19.0
+wcwidth==0.2.6
+webcolors==1.13
+webencodings==0.5.1
+websocket-client==1.5.3
+websockets==11.0.3
+wrapt==1.14.1
+xlrd==2.0.1
+XlsxWriter==3.1.2
+yarl==1.9.2
+youtube-transcript-api==0.6.0
+zstandard==0.21.0

run_local_server.bat ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ C:\Users\catsk\SourceCode\azure_openai_poc\venv\Scripts\activate
2	+ python C:\Users\catsk\SourceCode\azure_openai_poc\app.py server