h2ogpt-chatbot2

Runtime error

App Files Files Community

pseudotensor commited on Jul 8, 2023

Commit

54f4f91

1 Parent(s): 3f42f2e

Update with h2oGPT hash e4482a4c59016517cd0d5513bc15b78b46f4598a

Browse files

Files changed (24) hide show

LICENSE +0 -201
client_test.py +22 -14
enums.py +16 -1
finetune.py +0 -676
generate.py +0 -0
gpt4all_llm.py +9 -0
gpt_langchain.py +154 -51
gradio_runner.py +137 -39
gradio_utils/__pycache__/css.cpython-310.pyc +0 -0
gradio_utils/__pycache__/grclient.cpython-310.pyc +0 -0
gradio_utils/__pycache__/prompt_form.cpython-310.pyc +0 -0
gradio_utils/css.py +0 -53
gradio_utils/grclient.py +0 -82
gradio_utils/prompt_form.py +0 -118
h2o-logo.svg +0 -1
h2oai_pipeline.py +1 -0
iterators/__init__.py +0 -4
iterators/__pycache__/__init__.cpython-310.pyc +0 -0
iterators/__pycache__/iterator_pipe.cpython-310.pyc +0 -0
iterators/__pycache__/timeout_iterator.cpython-310.pyc +0 -0
iterators/iterator_pipe.py +0 -93
iterators/timeout_iterator.py +0 -170
prompter.py +4 -2
requirements.txt +0 -153

LICENSE DELETED Viewed

@@ -1,201 +0,0 @@
-                                Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-   1. Definitions.
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-   END OF TERMS AND CONDITIONS
-   APPENDIX: How to apply the Apache License to your work.
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-   Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
-   Licensed under the Apache License, Version 2.0 (the "License");
-   you may not use this file except in compliance with the License.
-   You may obtain a copy of the License at
-       http://www.apache.org/licenses/LICENSE-2.0
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.

client_test.py CHANGED Viewed

@@ -12,13 +12,13 @@ Currently, this will force model to be on a single GPU.
 Then run this client as:
-python client_test.py
 For HF spaces:
-HOST="https://h2oai-h2ogpt-chatbot.hf.space" python client_test.py
 Result:
@@ -28,7 +28,7 @@ Loaded as API: https://h2oai-h2ogpt-chatbot.hf.space ✔
 For demo:
-HOST="https://gpt.h2o.ai" python client_test.py
 Result:
@@ -48,7 +48,7 @@ import markdown  # pip install markdown
 import pytest
 from bs4 import BeautifulSoup  # pip install beautifulsoup4
-from enums import DocumentChoices
 debug = False
@@ -67,7 +67,9 @@ def get_client(serialize=True):
 def get_args(prompt, prompt_type, chat=False, stream_output=False,
              max_new_tokens=50,
              top_k_docs=3,
-             langchain_mode='Disabled'):
     from collections import OrderedDict
     kwargs = OrderedDict(instruction=prompt if chat else '',  # only for chat=True
                          iinput='',  # only for chat=True
@@ -76,7 +78,7 @@ def get_args(prompt, prompt_type, chat=False, stream_output=False,
                          # but leave stream_output=False for simple input/output mode
                          stream_output=stream_output,
                          prompt_type=prompt_type,
-                         prompt_dict='',
                          temperature=0.1,
                          top_p=0.75,
                          top_k=40,
@@ -92,12 +94,13 @@ def get_args(prompt, prompt_type, chat=False, stream_output=False,
                          instruction_nochat=prompt if not chat else '',
                          iinput_nochat='',  # only for chat=False
                          langchain_mode=langchain_mode,
                          top_k_docs=top_k_docs,
                          chunk=True,
                          chunk_size=512,
                          document_choice=[DocumentChoices.All_Relevant.name],
                          )
-    from generate import eval_func_param_names
     assert len(set(eval_func_param_names).difference(set(list(kwargs.keys())))) == 0
     if chat:
         # add chatbot output on end.  Assumes serialize=False
@@ -198,6 +201,7 @@ def run_client_nochat_api_lean_morestuff(prompt, prompt_type='human_bot', max_ne
         instruction_nochat=prompt,
         iinput_nochat='',
         langchain_mode='Disabled',
         top_k_docs=4,
         document_choice=['All'],
     )
@@ -219,21 +223,24 @@ def run_client_nochat_api_lean_morestuff(prompt, prompt_type='human_bot', max_ne
 @pytest.mark.skip(reason="For manual use against some server, no server launched")
 def test_client_chat(prompt_type='human_bot'):
     return run_client_chat(prompt='Who are you?', prompt_type=prompt_type, stream_output=False, max_new_tokens=50,
-                           langchain_mode='Disabled')
 @pytest.mark.skip(reason="For manual use against some server, no server launched")
 def test_client_chat_stream(prompt_type='human_bot'):
     return run_client_chat(prompt="Tell a very long kid's story about birds.", prompt_type=prompt_type,
                            stream_output=True, max_new_tokens=512,
-                           langchain_mode='Disabled')
-def run_client_chat(prompt, prompt_type, stream_output, max_new_tokens, langchain_mode):
     client = get_client(serialize=False)
     kwargs, args = get_args(prompt, prompt_type, chat=True, stream_output=stream_output,
-                            max_new_tokens=max_new_tokens, langchain_mode=langchain_mode)
     return run_client(client, prompt, args, kwargs)
@@ -276,14 +283,15 @@ def run_client(client, prompt, args, kwargs, do_md_to_text=True, verbose=False):
 def test_client_nochat_stream(prompt_type='human_bot'):
     return run_client_nochat_gen(prompt="Tell a very long kid's story about birds.", prompt_type=prompt_type,
                                  stream_output=True, max_new_tokens=512,
-                                 langchain_mode='Disabled')
-def run_client_nochat_gen(prompt, prompt_type, stream_output, max_new_tokens, langchain_mode):
     client = get_client(serialize=False)
     kwargs, args = get_args(prompt, prompt_type, chat=False, stream_output=stream_output,
-                            max_new_tokens=max_new_tokens, langchain_mode=langchain_mode)
     return run_client_gen(client, prompt, args, kwargs)

 Then run this client as:
+python src/client_test.py
 For HF spaces:
+HOST="https://h2oai-h2ogpt-chatbot.hf.space" python src/client_test.py
 Result:
 For demo:
+HOST="https://gpt.h2o.ai" python src/client_test.py
 Result:
 import pytest
 from bs4 import BeautifulSoup  # pip install beautifulsoup4
+from enums import DocumentChoices, LangChainAction
 debug = False
 def get_args(prompt, prompt_type, chat=False, stream_output=False,
              max_new_tokens=50,
              top_k_docs=3,
+             langchain_mode='Disabled',
+             langchain_action=LangChainAction.QUERY.value,
+             prompt_dict=None):
     from collections import OrderedDict
     kwargs = OrderedDict(instruction=prompt if chat else '',  # only for chat=True
                          iinput='',  # only for chat=True
                          # but leave stream_output=False for simple input/output mode
                          stream_output=stream_output,
                          prompt_type=prompt_type,
+                         prompt_dict=prompt_dict,
                          temperature=0.1,
                          top_p=0.75,
                          top_k=40,
                          instruction_nochat=prompt if not chat else '',
                          iinput_nochat='',  # only for chat=False
                          langchain_mode=langchain_mode,
+                         langchain_action=langchain_action,
                          top_k_docs=top_k_docs,
                          chunk=True,
                          chunk_size=512,
                          document_choice=[DocumentChoices.All_Relevant.name],
                          )
+    from src.gen import eval_func_param_names
     assert len(set(eval_func_param_names).difference(set(list(kwargs.keys())))) == 0
     if chat:
         # add chatbot output on end.  Assumes serialize=False
         instruction_nochat=prompt,
         iinput_nochat='',
         langchain_mode='Disabled',
+        langchain_action=LangChainAction.QUERY.value,
         top_k_docs=4,
         document_choice=['All'],
     )
 @pytest.mark.skip(reason="For manual use against some server, no server launched")
 def test_client_chat(prompt_type='human_bot'):
     return run_client_chat(prompt='Who are you?', prompt_type=prompt_type, stream_output=False, max_new_tokens=50,
+                           langchain_mode='Disabled', langchain_action=LangChainAction.QUERY.value)
 @pytest.mark.skip(reason="For manual use against some server, no server launched")
 def test_client_chat_stream(prompt_type='human_bot'):
     return run_client_chat(prompt="Tell a very long kid's story about birds.", prompt_type=prompt_type,
                            stream_output=True, max_new_tokens=512,
+                           langchain_mode='Disabled', langchain_action=LangChainAction.QUERY.value)
+def run_client_chat(prompt, prompt_type, stream_output, max_new_tokens, langchain_mode, langchain_action,
+                    prompt_dict=None):
     client = get_client(serialize=False)
     kwargs, args = get_args(prompt, prompt_type, chat=True, stream_output=stream_output,
+                            max_new_tokens=max_new_tokens, langchain_mode=langchain_mode,
+                            langchain_action=langchain_action,
+                            prompt_dict=prompt_dict)
     return run_client(client, prompt, args, kwargs)
 def test_client_nochat_stream(prompt_type='human_bot'):
     return run_client_nochat_gen(prompt="Tell a very long kid's story about birds.", prompt_type=prompt_type,
                                  stream_output=True, max_new_tokens=512,
+                                 langchain_mode='Disabled', langchain_action=LangChainAction.QUERY.value)
+def run_client_nochat_gen(prompt, prompt_type, stream_output, max_new_tokens, langchain_mode, langchain_action):
     client = get_client(serialize=False)
     kwargs, args = get_args(prompt, prompt_type, chat=False, stream_output=stream_output,
+                            max_new_tokens=max_new_tokens, langchain_mode=langchain_mode,
+                            langchain_action=langchain_action)
     return run_client_gen(client, prompt, args, kwargs)

enums.py CHANGED Viewed

@@ -37,6 +37,9 @@ class DocumentChoices(Enum):
     Just_LLM = 3
 class LangChainMode(Enum):
     """LangChain mode"""
@@ -52,10 +55,22 @@ class LangChainMode(Enum):
     H2O_DAI_DOCS = "DriverlessAI docs"
 no_server_str = no_lora_str = no_model_str = '[None/Remove]'
-# from site-packages/langchain/llms/openai.py, but needed since ChatOpenAI doesn't have this information
 model_token_mapping = {
     "gpt-4": 8192,
     "gpt-4-0314": 8192,

     Just_LLM = 3
+non_query_commands = [DocumentChoices.All_Relevant_Only_Sources.name, DocumentChoices.Only_All_Sources.name]
 class LangChainMode(Enum):
     """LangChain mode"""
     H2O_DAI_DOCS = "DriverlessAI docs"
+class LangChainAction(Enum):
+    """LangChain action"""
+    QUERY = "Query"
+    # WIP:
+    #SUMMARIZE_MAP = "Summarize_map_reduce"
+    SUMMARIZE_MAP = "Summarize"
+    SUMMARIZE_ALL = "Summarize_all"
+    SUMMARIZE_REFINE = "Summarize_refine"
 no_server_str = no_lora_str = no_model_str = '[None/Remove]'
+# from site-packages/langchain/llms/openai.py
+# but needed since ChatOpenAI doesn't have this information
 model_token_mapping = {
     "gpt-4": 8192,
     "gpt-4-0314": 8192,

finetune.py DELETED Viewed

@@ -1,676 +0,0 @@
-import os
-import sys
-from functools import partial
-from typing import List, Union
-import fire
-import numpy as np
-if os.path.dirname(os.path.abspath(__file__)) not in sys.path:
-    sys.path.append(os.path.dirname(os.path.abspath(__file__)))
-from loaders import get_loaders, get_tokenizer
-from prompter import generate_prompt, prompt_types, PromptType
-from utils import get_githash, copy_code
-import torch
-def log(*args, **kwargs):
-    if int(os.environ.get("LOCAL_RANK", 0)) == 0:
-        if 'flush' not in kwargs:
-            kwargs['flush'] = True
-        print(*args, **kwargs)
-# supported by huggingface evaluate
-supported_metrics = ['bleu', 'rouge', 'sacrebleu', 'meteor']
-def train(
-        save_code: bool = False,
-        run_id: int = None,
-        base_model: str = 'h2oai/h2ogpt-oig-oasst1-512-6_9b',
-        # base_model: str = 'h2oai/h2ogpt-oasst1-512-12b',
-        # base_model: str = 'h2oai/h2ogpt-oasst1-512-20b',
-        # base_model: str = 'EleutherAI/gpt-neox-20b',
-        # base_model: str = 'EleutherAI/pythia-12b-deduped',
-        # base_model: str = 'togethercomputer/GPT-NeoXT-Chat-Base-20B',
-        # base_model: str = 'decapoda-research/llama-7b-hf',
-        # base_model: str = 'decapoda-research/llama-13b-hf',
-        # base_model: str = 'decapoda-research/llama-30b-hf',
-        # base_model: str = 'EleutherAI/gpt-j-6B',
-        # only needed if base_model is self-exported HF state without tokenizer
-        tokenizer_base_model: str = None,
-        # tokenizer_base_model: str = 'EleutherAI/gpt-neox-20b',
-        data_path: str = "h2oai/openassistant_oasst1_h2ogpt",
-        data_col_dict: dict = None,
-        # data_path: str = "./dai_docs.train.json",
-        prompt_type: Union[str, int] = "plain",  # "plain", "instruct", "quality", "human_bot", "dai_faq"
-        valid_path: str = None,
-        # valid_path: str = "./dai_docs.valid.json",
-        # data_mix_in_path: str = "laion/OIG",  # way too big, medium quality
-        data_mix_in_path: str = "0-hero/OIG-small-chip2",  # high quality, 50 MB, good enough for now
-        data_mix_in_factor: float = 0.0,  # >1: more mix-in data, <1: more of data_path data
-        data_mix_in_col_dict: dict = {'user': 'instruction', 'chip2': 'output'},
-        data_mix_in_prompt_type: str = "instruct",  # just instruction->output, same as instruct
-        output_dir: str = None,
-        # LoRA checkpoint continuation
-        lora_weights: str = "",
-        # batching training hyperparams
-        batch_size: int = 128,
-        micro_batch_size: int = 4,
-        gradient_checkpointing=False,  # unnecessary with gradient accumulation enabled
-        fp16=True,
-        train_8bit=False,
-        train_4bit=False,
-        # general training hyperparams
-        num_epochs: float = 1,
-        learning_rate: float = 3e-4,
-        # validation settings
-        val_set_size: int = None,
-        val_metrics: List[str] = [],
-        eval_steps: int = None,  # to control eval steps via steps
-        eval_epochs: float = None,  # to control eval steps via epochs
-        # lora hyperparams
-        lora_r: int = 8,
-        lora_alpha: int = 16,
-        lora_dropout: float = 0.05,
-        lora_target_modules: List[str] = None,
-        llama_type: bool = None,
-        llama_flash_attn: bool = False,
-        # llm hyperparams
-        train_on_inputs: bool = True,  # if False, masks out inputs in loss
-        group_by_length: bool = False,  # if True, faster, but produces an odd training loss curve
-        resume_from_checkpoint: str = None,  # either training checkpoint or final adapter
-        cutoff_len: int = 512,  # larger values use more memory
-        drop_truncations: bool = False,  # if True, drop any truncated long sequences
-        # torch training params
-        ddp: bool = True,  # set to False if OOM with True, for multi-GPU model parallelism
-        local_files_only: bool = False,  # else will download new versions, normally unwanted
-        resume_download: bool = True,
-        use_auth_token: Union[str, bool] = False,  # True requires CLI did huggingface-cli login before running
-        warmup_steps: int = 100,
-        logging_steps: int = 1,
-        save_steps: int = None,  # must be round multiple of eval_steps
-        save_total_limit: int = 3,
-        add_eos_token: bool = False,
-):
-    if llama_flash_attn:
-        # Need to call this before importing transformers.
-        from llama_flash_attn_monkey_patch import replace_llama_attn_with_flash_attn
-        replace_llama_attn_with_flash_attn()
-    # allow set token directly
-    use_auth_token = os.environ.get("HUGGINGFACE_API_TOKEN", use_auth_token)
-    prompt_type = str(prompt_type)  # migration from integers
-    assert prompt_type in prompt_types
-    world_size = int(os.getenv("WORLD_SIZE", 1))
-    local_rank = int(os.getenv("LOCAL_RANK", 0))
-    rank = int(os.getenv("RANK", 0))
-    print(f"local_rank: {local_rank}")
-    print(f"global rank: {rank}")
-    gpus = max(world_size, torch.cuda.device_count())
-    run_id = run_id or 0
-    if not data_path:
-        raise ValueError("No data_path provided")
-    if not output_dir:
-        output_dir = f"{base_model.split('/')[-1]}.{data_path.replace('/', '')}.{num_epochs}_epochs.{get_githash() or 'nogit'}.{run_id}"
-        if os.path.exists(output_dir) and not resume_from_checkpoint:
-            raise FileExistsError(
-                f"output_dir {output_dir} based on run_id {run_id} already exists. Please pick a different run_id.")
-    else:
-        if os.path.exists(output_dir) and not resume_from_checkpoint:
-            raise FileExistsError(
-                f"output_dir {output_dir} already exists. Please pick a different output_dir, or specify a run_id instead.")
-    device_map = "auto"
-    if save_code:
-        copy_code(run_id)
-    if tokenizer_base_model is None:
-        tokenizer_base_model = base_model
-    if llama_type is None:
-        llama_type = "llama" in base_model.lower()
-    if llama_type and llama_flash_attn:
-        import pkg_resources
-        try:
-            pkg_resources.get_distribution('flash_attn')
-            can_do_flash_attn = True
-        except (pkg_resources.DistributionNotFound, pkg_resources.ContextualVersionConflict):
-            can_do_flash_attn = False
-        if not can_do_flash_attn:
-            raise RuntimeError("""Flash attention not installed.
-            NOTE: for current pytorch 2.0, flash attention requires installing cuda 11.7 via https://developer.nvidia.com/cuda-11-7-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=runfile_local and then when running, to avoid installing driver, docs, samples, just install toolkit.  Then when pip installing flash attention do:
-            CUDA_HOME=/usr/local/cuda-11.7 pip install flash-attn""")
-    assert (
-        base_model
-    ), "Please specify a --base_model, e.g. --base_model='decapoda-research/llama-7b-hf'"
-    gradient_accumulation_steps = batch_size // micro_batch_size
-    assert gradient_accumulation_steps >= world_size, "must increase batch_size for multi-GPU"
-    device_map = "auto"
-    locals_dict = locals()
-    locals_print = '\n'.join(['%s: %s' % (k, v) for k, v in locals_dict.items()])
-    log(f"Training model with params:\n{locals_print}")
-    log("Command: %s\nHash: %s" % (str(' '.join(sys.argv)), get_githash()))
-    max_memory = None
-    if gpus > 1:
-        if ddp:
-            log("Distributed: data parallel")
-            device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
-            gradient_accumulation_steps = gradient_accumulation_steps // world_size
-        else:
-            free_in_GB = int(min(torch.cuda.mem_get_info()) / 1024 ** 3)
-            max_memory = f"{free_in_GB - 2}GB"
-            max_memory = {i: max_memory for i in range(gpus)}
-            log("world_size: %d" % world_size)
-            log("num_gpus: %d" % gpus)
-            log("max mem: %s" % max_memory)
-    model_loader, tokenizer_loader = get_loaders(model_name=base_model, reward_type=False, llama_type=llama_type)
-    model = model_loader.from_pretrained(
-        base_model,
-        load_in_8bit=train_8bit,
-        load_in_4bit=train_4bit,
-        device_map=device_map,
-        torch_dtype=torch.float16,
-        max_memory=max_memory,
-        local_files_only=local_files_only,
-        trust_remote_code=True,
-        resume_download=resume_download,
-        use_auth_token=use_auth_token,
-    )
-    if gpus > 1:
-        if not ddp:
-            log("model parallel")
-            model.is_parallelizable = True
-            model.model_parallel = True
-    tokenizer = get_tokenizer(tokenizer_loader, tokenizer_base_model, local_files_only, resume_download, use_auth_token)
-    if train_8bit or train_4bit:
-        from peft import (
-            prepare_model_for_kbit_training,
-        )
-        model = prepare_model_for_kbit_training(model)
-    from peft import LoraConfig, get_peft_model, set_peft_model_state_dict
-    try:
-        from peft import utils
-        lora_mappings = utils.TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING.copy()
-    except AttributeError:
-        from peft import mapping
-        lora_mappings = mapping.TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING.copy()
-    lora_mappings['distilgpt2'] = ["c_attn"]
-    if lora_weights:
-        from peft import PeftModel
-        model = PeftModel.from_pretrained(
-            model,
-            lora_weights,
-            torch_dtype=torch.float16,
-            device_map=device_map,
-            local_files_only=local_files_only,
-            resume_download=resume_download,
-            use_auth_token=use_auth_token,
-        )
-    elif lora_r > 0:
-        if lora_target_modules is None:
-            base_model_lower = base_model.lower()
-            if base_model_lower in lora_mappings:
-                lora_target_modules_cand = [lora_mappings[base_model_lower]]
-            else:
-                lora_target_modules_cand = [["query_key_value"], ["q_proj", "v_proj"]]
-        else:
-            lora_target_modules_cand = [lora_target_modules]
-        for lora_target_modules in lora_target_modules_cand:
-            try:
-                config = LoraConfig(
-                    r=lora_r,
-                    lora_alpha=lora_alpha,
-                    target_modules=lora_target_modules,
-                    lora_dropout=lora_dropout,
-                    bias="none",
-                    task_type="CAUSAL_LM",
-                )
-                model = get_peft_model(model, config)
-                break
-            except ValueError as e:
-                if "Target modules" in str(e) and "not found" in str(e):
-                    continue
-                else:
-                    raise
-        from peft import PeftModel
-        assert isinstance(model, PeftModel), "LoRA failed. Please provide --lora_target_modules explicitly."
-    if resume_from_checkpoint:
-        # Check the available weights and load them
-        checkpoint_name = os.path.join(
-            resume_from_checkpoint, "pytorch_model.bin"
-        )  # Full checkpoint
-        if not os.path.exists(checkpoint_name):
-            checkpoint_name = os.path.join(
-                resume_from_checkpoint, "adapter_model.bin"
-            )  # only LoRA model - LoRA config above has to fit
-            resume_from_checkpoint = False  # So the trainer won't try loading its state
-        # The two files above have a different name depending on how they were saved, but are actually the same.
-        if os.path.exists(checkpoint_name):
-            log(f"Restarting from {checkpoint_name}")
-            adapters_weights = torch.load(checkpoint_name)
-            set_peft_model_state_dict(model, adapters_weights)
-        else:
-            log(f"Checkpoint {checkpoint_name} not found")
-    print(model)
-    try:
-        # only for PeftModel
-        model.print_trainable_parameters()  # Be more transparent about the % of trainable params.
-    except:
-        pass
-    metrics = {}
-    for name in supported_metrics:
-        if name in val_metrics:
-            import evaluate  # Causes hang for 'python generate.py' on dual 4090 if imported early, 100% reproducible
-            metrics[name] = evaluate.load(name)
-    log("Using Validation Metrics: %s" % str(list(metrics.keys())))
-    log("Supported Metrics: %s" % supported_metrics)
-    if val_set_size is None:
-        if len(metrics) == 0:
-            val_set_size = 1000
-        else:
-            val_set_size = 100
-        log("Auto set val_set_size %s" % val_set_size)
-    elif val_set_size < 1.0 and val_set_size != 0:
-        raise RuntimeError("Fractional validation size not supported.")
-    from datasets import load_dataset, concatenate_datasets
-    if valid_path:
-        data = load_dataset("json", data_files={"train": data_path, "valid": valid_path})
-    else:
-        if "json" in data_path:
-            data = load_dataset("json", data_files={"train": data_path})
-        else:
-            data = load_dataset(data_path)
-            data = data.rename_columns(data_col_dict or {})
-    valid_data = None
-    train_data_mix_in = None
-    valid_data_mix_in = None
-    if data_mix_in_path and data_mix_in_factor > 0:
-        # get mix-in training/validation data - to keep model "sane"
-        num_rows = data["train"].num_rows
-        log("Loading mix-in dataset: %s" % data_mix_in_path)
-        if "json" in data_mix_in_path:
-            data_mix_in = load_dataset("json", data_files={"train": data_mix_in_path})["train"]
-        else:
-            data_mix_in = load_dataset(data_mix_in_path)["train"]  # can be large
-        data_mix_in = data_mix_in.rename_columns(data_mix_in_col_dict or {})
-        mix_in_rows = int(num_rows * data_mix_in_factor)
-        if mix_in_rows > data_mix_in.num_rows:
-            # duplicate rows if mix-in is smaller than required
-            log("Duplicating mixin to compensate for its size for training size and mixin fraction")
-            data_mix_in = concatenate_datasets([data_mix_in] * int(np.ceil(mix_in_rows / data_mix_in.num_rows)))
-        # only get as much as we need to balance
-        valid_size = min(data_mix_in.num_rows // 2, val_set_size or 0)
-        train_size = max(1, min(data_mix_in.num_rows - valid_size, mix_in_rows))
-        mixin_small = data_mix_in.train_test_split(
-            test_size=train_size + valid_size,
-            shuffle=True, seed=np.random.randint(10000),
-        )["test"]
-        if valid_size:
-            mixin_train_test = mixin_small.train_test_split(
-                test_size=valid_size, shuffle=False,
-            )
-            train_data_mix_in = mixin_train_test["train"]
-            valid_data_mix_in = mixin_train_test["test"]
-        else:
-            train_data_mix_in = mixin_small
-        if "prompt_type" not in train_data_mix_in.column_names:
-            train_data_mix_in = train_data_mix_in.add_column(
-                "prompt_type",
-                [data_mix_in_prompt_type] * train_data_mix_in.num_rows,
-            )
-            log("Added prompt type %s to mix-in training data" % data_mix_in_prompt_type)
-        if valid_data_mix_in and "prompt_type" not in valid_data_mix_in.column_names:
-            valid_data_mix_in = valid_data_mix_in.add_column(
-                "prompt_type",
-                [data_mix_in_prompt_type] * valid_data_mix_in.num_rows,
-            )
-            log("Added prompt type %s to mix-in validation data" % data_mix_in_prompt_type)
-        log("Created mix-in data:\nTrain %s\nValid %s" % (train_data_mix_in, valid_data_mix_in))
-    # get our own training/validation data - for fine-tuning
-    if val_set_size > 0 and not valid_path and not data_mix_in_path:
-        # create valid split from train
-        train_val = data["train"].train_test_split(
-            test_size=val_set_size, shuffle=True, seed=42
-        )
-        train_data = train_val["train"]
-        valid_data = train_val["test"]
-    else:
-        train_data = data["train"]
-        if valid_path:
-            # use given valid split, has priority over data_mix_in_path
-            valid_data = data["valid"]
-    if "prompt_type" not in train_data.column_names:
-        train_data = train_data.add_column(
-            "prompt_type",
-            [prompt_type] * train_data.num_rows,
-        )
-        log("Added prompt type %s to training data" % prompt_type)
-    if valid_data and "prompt_type" not in valid_data.column_names:
-        valid_data = valid_data.add_column(
-            "prompt_type",
-            [prompt_type] * valid_data.num_rows,
-        )
-        log("Added prompt type %s to validation data" % prompt_type)
-    assert train_data is not None
-    generate_and_tokenize_prompt_fun = partial(generate_and_tokenize_prompt, prompt_type=prompt_type,
-                                               train_on_inputs=train_on_inputs, add_eos_token=add_eos_token,
-                                               cutoff_len=cutoff_len, tokenizer=tokenizer)
-    # shuffle and tokenize data
-    if train_data_mix_in:
-        train_data = concatenate_datasets([train_data, train_data_mix_in])
-    log("Tokenizing %s training rows" % train_data.num_rows)
-    train_data = train_data.shuffle().map(generate_and_tokenize_prompt_fun,
-                                          num_proc=os.cpu_count() // torch.cuda.device_count())
-    if drop_truncations:
-        log("avoid keeping truncated cases to avoid contaminating model with truncation cases.  Original size: %s" % train_data.num_rows)
-        prune_long_sequences_func = partial(prune_long_sequences, cutoff_len=cutoff_len)
-        train_data = train_data.filter(prune_long_sequences_func, num_proc=os.cpu_count() // torch.cuda.device_count())
-        log("avoid keeping truncated cases to avoid contaminating model with truncation cases.  New size: %s" % train_data.num_rows)
-    train_set_size = len(train_data)
-    if valid_data and valid_data_mix_in:
-        valid_data = concatenate_datasets([valid_data, valid_data_mix_in])
-    elif valid_data_mix_in:
-        valid_data = valid_data_mix_in
-    if valid_data:
-        log("Tokenizing %s validation rows" % valid_data.num_rows)
-        valid_data = valid_data.shuffle().map(generate_and_tokenize_prompt_fun,
-                                              num_proc=os.cpu_count() // torch.cuda.device_count())
-        val_set_size = len(valid_data)
-    else:
-        val_set_size = 0
-    log("Final fine-tuning data:\nTrain %s\nValid %s" % (train_data, valid_data))
-    sample_row_dict = train_data[:1]
-    del sample_row_dict['input_ids']
-    del sample_row_dict['attention_mask']
-    del sample_row_dict['labels']
-    log("Sample input: %s" % sample_row_dict)
-    try:
-        import neptune
-        from transformers.integrations import NeptuneCallback
-        neptune_run = neptune.init_run(
-            source_files=[],
-        )
-        log("Connected to Neptune.")
-    except ImportError:
-        neptune_run = None
-        log("Please pip install neptune for tracking.")
-    except neptune.exceptions.NeptuneMissingApiTokenException:
-        neptune_run = None
-        os.environ["NEPTUNE_MODE"] = 'debug'
-        log("No neptune configured, set NEPTUNE_API_TOKEN env var.")
-    if neptune_run:
-        neptune_callback = NeptuneCallback(run=neptune_run)
-        callbacks = [neptune_callback]
-    else:
-        from transformers.integrations import TensorBoardCallback, is_tensorboard_available
-        if is_tensorboard_available:
-            # tensorboard --logdir=runs/
-            from torch.utils.tensorboard import SummaryWriter
-            tb_writer = SummaryWriter()
-            callbacks = [TensorBoardCallback(tb_writer=tb_writer)]
-        else:
-            callbacks = []
-    expected_steps = (train_set_size * num_epochs) // batch_size
-    if eval_steps is None and eval_epochs is None:
-        # 20 evaluations for a run
-        eval_steps = max(1, int(expected_steps / 20))
-        log("Auto set eval_steps to %s out of %s total training steps" % (eval_steps, expected_steps))
-    elif eval_steps is None and eval_epochs is not None:
-        eval_steps = max(1, int(expected_steps * eval_epochs / num_epochs))
-        log("Auto converted eval_epochs=%s to eval_steps %s"
-            " out of %s total training steps" % (eval_epochs, eval_steps, expected_steps))
-    if save_steps is None:
-        save_steps = eval_steps
-        log("Auto step save_steps to %s" % save_steps)
-    elif save_steps > eval_steps:
-        # save steps must be round multiple of eval_steps
-        save_steps0 = save_steps
-        save_steps = max(1, (save_steps // eval_steps)) * eval_steps
-        if save_steps0 != save_steps:
-            log("Auto converted save_steps from %s to %s" % (save_steps0, save_steps))
-    def compute_metrics(eval_preds):
-        # e.g. see: https://huggingface.co/docs/transformers/v4.25.1/en/tasks/translation#evaluate
-        inputs = eval_preds.inputs
-        label_ids = eval_preds.label_ids
-        predictions = eval_preds.predictions
-        # inputs = np.where(inputs != -100, inputs, tokenizer.pad_token_id)
-        # decoded_inputs = tokenizer.batch_decode(inputs, skip_special_tokens=True)
-        # decoded_inputs = [pred.strip() for pred in decoded_inputs]
-        label_ids = np.where(label_ids != -100, label_ids, tokenizer.pad_token_id)
-        # tokenizer behavior like generate time
-        decoded_labels = tokenizer.batch_decode(label_ids, skip_special_tokens=True,
-                                                clean_up_tokenization_spaces=True)
-        decoded_labels = [pred.strip() for pred in decoded_labels]
-        predictions = np.argmax(predictions, -1)
-        predictions = np.where(predictions != -100, predictions, tokenizer.pad_token_id)
-        # tokenizer behavior like generate time
-        decoded_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True,
-                                                     clean_up_tokenization_spaces=True)
-        decoded_predictions = [pred.strip() for pred in decoded_predictions]
-        result = {}
-        for metric in metrics.values():
-            result1 = metric.compute(predictions=decoded_predictions, references=decoded_labels)
-            # get rid of lists, for precision etc., for now
-            numeric_results = {k: v for k, v in result1.items() if isinstance(v, (int, float))}
-            result.update(numeric_results)
-        return result
-    # the callback that computes metrics of interest
-    if val_metrics:
-        trainer_kwargs = dict(compute_metrics=compute_metrics)
-    else:
-        trainer_kwargs = dict()
-    import transformers
-    trainer = transformers.Trainer(
-        model=model,
-        tokenizer=tokenizer,
-        train_dataset=train_data,
-        eval_dataset=valid_data,
-        # FIXME: might need Seq2SeqTrainingArguments for some models
-        args=transformers.TrainingArguments(
-            per_device_train_batch_size=micro_batch_size,
-            per_device_eval_batch_size=1,
-            eval_accumulation_steps=10,
-            # predict_with_generate=True,  # SEQ2SEQ only
-            include_inputs_for_metrics=True,
-            gradient_accumulation_steps=gradient_accumulation_steps,
-            warmup_steps=warmup_steps,
-            num_train_epochs=num_epochs,
-            learning_rate=learning_rate,
-            gradient_checkpointing=gradient_checkpointing,
-            fp16=fp16,
-            # cosnider 8-bit adam: https://huggingface.co/docs/transformers/v4.18.0/en/performance#8bit-adam
-            optim="adamw_torch",  # consider "adafactor" to save memory
-            logging_steps=logging_steps,
-            logging_strategy="steps",
-            evaluation_strategy="steps" if val_set_size > 0 else "no",
-            save_strategy="steps",
-            eval_steps=eval_steps if val_set_size > 0 else None,
-            save_steps=save_steps,
-            output_dir=output_dir,
-            save_total_limit=save_total_limit,
-            load_best_model_at_end=True if val_set_size > 0 else False,
-            ddp_find_unused_parameters=False if ddp else None,
-            group_by_length=group_by_length,
-            # fsdp="shard_grad_op auto_wrap" if gpus > 1 and not ddp else None,
-            # fsdp_min_num_params=20000 if gpus > 1 and not ddp else None,
-            report_to='tensorboard' if not neptune_run else 'neptune',
-        ),
-        data_collator=transformers.DataCollatorForSeq2Seq(
-            tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
-        ),
-        callbacks=callbacks,
-        **trainer_kwargs,
-    )
-    model.config.use_cache = False
-    if torch.__version__ >= "2" and sys.platform != "win32":
-        model = torch.compile(model)
-        # WIP (not generally replacing layers until pytorch 2.1)
-        if not llama_flash_attn:
-            torch.backends.cuda.enable_flash_sdp(True)
-    if gpus > 1 and not ddp:
-        assert trainer.is_model_parallel
-    else:
-        assert not trainer.is_model_parallel
-    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
-    model.save_pretrained(output_dir)
-    log("\n If there's a warning about missing keys above, please disregard :)")
-def tokenize(prompt, tokenizer, cutoff_len, add_eos_token=False):
-    # there's probably a way to do this with the tokenizer settings
-    # but again, gotta move fast
-    result = tokenizer(
-        prompt,
-        truncation=True,
-        max_length=cutoff_len,
-        padding=False,
-        return_tensors=None,
-    )
-    if (
-            result["input_ids"][-1] != tokenizer.eos_token_id
-            and len(result["input_ids"]) < cutoff_len
-            and add_eos_token
-    ):
-        result["input_ids"].append(tokenizer.eos_token_id)
-        result["attention_mask"].append(1)
-    result["labels"] = result["input_ids"].copy()
-    return result
-def prune_long_sequences(data_point, cutoff_len=None):
-    """
-    Prune if too long for tokenizer, so truncation doesn't lead training to learn from truncated language
-    :param data_point:
-    :param cutoff_len:
-    :return:
-    """
-    assert cutoff_len is not None
-    return len(data_point['input_ids']) < cutoff_len
-def generate_and_tokenize_prompt(data_point, prompt_type=None, train_on_inputs=False, add_eos_token=False,
-                                 cutoff_len=None, tokenizer=None):
-    assert prompt_type is not None
-    assert cutoff_len is not None
-    assert tokenizer is not None
-    prompt_dict = ''  # only for custom prompt_type
-    assert prompt_type != PromptType.custom.name, "custom not setup for finetune"
-    full_prompt, _, _, _, _ = generate_prompt(data_point, prompt_type, prompt_dict, False, False, False)
-    tokenized_full_prompt = tokenize(full_prompt, tokenizer, cutoff_len, add_eos_token=add_eos_token)
-    if not train_on_inputs:
-        user_prompt, _, _, _, _ = generate_prompt({**data_point, "output": ""}, prompt_type, prompt_dict, False, False, False)
-        tokenized_user_prompt = tokenize(user_prompt, tokenizer, cutoff_len, add_eos_token=add_eos_token)
-        user_prompt_len = len(tokenized_user_prompt["input_ids"])
-        if add_eos_token:
-            user_prompt_len -= 1
-        # ignore_index=-100 ensures torch/tf don't include padding token id in CrossEntropyLoss
-        tokenized_full_prompt["labels"] = [
-                                              -100
-                                          ] * user_prompt_len + tokenized_full_prompt["labels"][
-                                                                user_prompt_len:
-                                                                ]  # could be sped up, probably
-    return tokenized_full_prompt
-def test_debug():
-    fire.Fire(train)
-def entrypoint_main():
-    CONFIG = "NCCL_P2P_LEVEL=LOC WORLD_SIZE=5 torchrun --nnodes=5 --master_addr=10.10.10.2 --master_port=1111 --nproc_per_node=1"
-    CMD = "finetune.py --data_path=config.json --num_epochs=1 --base_model=decapoda-research/llama-13b-hf"
-    log(f"""
-    Example runs on 4 GPUs:
-    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='decapoda-research/llama-7b-hf' --data_path=data/config.json --run_id=0 &> 0.log
-    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='decapoda-research/llama-30b-hf' --data_path=data/config.json --batch_size=16 --micro_batch_size=1 --run_id=1 --save_code=True &> 1.log
-    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='EleutherAI/gpt-j-6B' --data_path=data/config.json --run_id=2 &> 2.log
-    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='EleutherAI/gpt-neox-20b' --data_path=data/config.json --run_id=8 --batch_size=16 --micro_batch_size=4 &> 8.log
-    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='togethercomputer/GPT-NeoXT-Chat-Base-20B' --data_path=data/config.json --prompt_type='dai_faq' --run_id=13 --batch_size=16 --micro_batch_size=4 --num_epochs=100 --val_set_size=0 data_mix_in_path='' &> 13.log
-    WORLD_SIZE=4 CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 finetune.py --base_model='togethercomputer/GPT-NeoXT-Chat-Base-20B' --data_path=data/config.json --run_id=28 --batch_size=16 --micro_batch_size=4 --num_epochs=8 --val_set_size=0 --data_mix_in_factor=0.1 --data_mix_in_prompt_type='human_bot' --save_code=True --cutoff_len=512  &> 28.log
-    All metrics:
-    CUDA_VISIBLE_DEVICES= finetune.py --data_mix_in_factor=0 --eval_steps=100 --warmup_steps=2 --val_set_size=100 --val_metrics="['bleu', 'rouge', 'sacrebleu', 'meteor']"
-    # Fine-tune 20B on 24GB GPUs across 3 nodes with 3+2+2 GPUs
-    rippa>
-NCCL_P2P_LEVEL=LOC WORLD_SIZE=7 CUDA_VISIBLE_DEVICES="0,1,2" torchrun --node_rank 0 --nproc_per_node=3 --master_port=1234 --nnodes=3 --master_addr=10.10.10.2 finetune.py --data_path=merged_shuffled_OIG_87f6a1e788.json --micro_batch_size=1 --batch_size=7 --cutoff_len=512 --run_id=17 &>log.17.rank0
-    ova>
-NCCL_P2P_LEVEL=LOC WORLD_SIZE=7 CUDA_VISIBLE_DEVICES="0,1" torchrun --node_rank 1 --nproc_per_node=2 --master_port=1234 --nnodes=3 --master_addr=10.10.10.2 finetune.py --data_path=merged_shuffled_OIG_87f6a1e788.json --micro_batch_size=1 --batch_size=7 --cutoff_len=512 --run_id=17 &>log.17.rank1
-    timemachine>
-NCCL_P2P_LEVEL=LOC WORLD_SIZE=7 CUDA_VISIBLE_DEVICES="0,1" torchrun --node_rank 2 --nproc_per_node=2 --master_port=1234 --nnodes=3 --master_addr=10.10.10.2 finetune.py --data_path=merged_shuffled_OIG_87f6a1e788.json --micro_batch_size=1 --batch_size=7 --cutoff_len=512 --run_id=17 &>log.17.rank2
-    """, flush=True)
-    if os.environ.get("LOCAL_RANK") is None:
-        # then not using torchrun, so can't do distributed, ensure CVD set
-        assert os.environ.get(
-            "CUDA_VISIBLE_DEVICES") is not None, "Run python script using: torchrun finetune.py OR set CUDA_VISIBLE_DEVICES to single GPU"
-    fire.Fire(train)
-if __name__ == "__main__":
-    entrypoint_main()

generate.py DELETED Viewed

The diff for this file is too large to render. See raw diff

gpt4all_llm.py CHANGED Viewed

@@ -19,6 +19,15 @@ def get_model_tokenizer_gpt4all(base_model, **kwargs):
                         n_ctx=2048 - 256)
     env_gpt4all_file = ".env_gpt4all"
     model_kwargs.update(dotenv_values(env_gpt4all_file))
     if base_model == "llama":
         if 'model_path_llama' not in model_kwargs:

                         n_ctx=2048 - 256)
     env_gpt4all_file = ".env_gpt4all"
     model_kwargs.update(dotenv_values(env_gpt4all_file))
+    # make int or float if can to satisfy types for class
+    for k, v in model_kwargs.items():
+        try:
+            if float(v) == int(v):
+                model_kwargs[k] = int(v)
+            else:
+                model_kwargs[k] = float(v)
+        except:
+            pass
     if base_model == "llama":
         if 'model_path_llama' not in model_kwargs:

gpt_langchain.py CHANGED Viewed

@@ -23,8 +23,9 @@ from langchain.callbacks import streaming_stdout
 from langchain.embeddings import HuggingFaceInstructEmbeddings
 from tqdm import tqdm
-from enums import DocumentChoices, no_lora_str, model_token_mapping, source_prefix, source_postfix
-from generate import gen_hyper, get_model, SEED
 from prompter import non_hf_types, PromptType, Prompter
 from utils import wrapped_partial, EThread, import_matplotlib, sanitize_filename, makedirs, get_url, flatten_list, \
     get_device, ProgressParallel, remove, hash_file, clear_torch_cache, NullContext, get_hf_server, FakeTokenizer
@@ -43,7 +44,8 @@ from langchain.chains.qa_with_sources import load_qa_with_sources_chain
 from langchain.document_loaders import PyPDFLoader, TextLoader, CSVLoader, PythonLoader, TomlLoader, \
     UnstructuredURLLoader, UnstructuredHTMLLoader, UnstructuredWordDocumentLoader, UnstructuredMarkdownLoader, \
     EverNoteLoader, UnstructuredEmailLoader, UnstructuredODTLoader, UnstructuredPowerPointLoader, \
-    UnstructuredEPubLoader, UnstructuredImageLoader, UnstructuredRTFLoader, ArxivLoader, UnstructuredPDFLoader
 from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
 from langchain.chains.question_answering import load_qa_chain
 from langchain.docstore.document import Document
@@ -351,6 +353,7 @@ class GradioInference(LLM):
         stream_output = self.stream
         gr_client = self.client
         client_langchain_mode = 'Disabled'
         top_k_docs = 1
         chunk = True
         chunk_size = 512
@@ -379,6 +382,7 @@ class GradioInference(LLM):
                              instruction_nochat=prompt if not self.chat_client else '',
                              iinput_nochat='',  # only for chat=False
                              langchain_mode=client_langchain_mode,
                              top_k_docs=top_k_docs,
                              chunk=chunk,
                              chunk_size=chunk_size,
@@ -637,6 +641,7 @@ def get_llm(use_openai_model=False,
         callbacks = [StreamingGradioCallbackHandler()]
         assert prompter is not None
         stop_sequences = list(set(prompter.terminate_response + [prompter.PreResponse]))
         if gr_client:
             chat_client = False
@@ -744,7 +749,7 @@ def get_llm(use_openai_model=False,
         if stream_output:
             skip_prompt = False
-            from generate import H2OTextIteratorStreamer
             decoder_kwargs = {}
             streamer = H2OTextIteratorStreamer(tokenizer, skip_prompt=skip_prompt, block=False, **decoder_kwargs)
             gen_kwargs.update(dict(streamer=streamer))
@@ -944,14 +949,16 @@ have_playwright = False
 image_types = ["png", "jpg", "jpeg"]
 non_image_types = ["pdf", "txt", "csv", "toml", "py", "rst", "rtf",
-                   "md", "html",
                    "enex", "eml", "epub", "odt", "pptx", "ppt",
                    "zip", "urls",
                    ]
 # "msg",  GPL3
 if have_libreoffice:
-    non_image_types.extend(["docx", "doc"])
 file_types = non_image_types + image_types
@@ -961,7 +968,7 @@ def add_meta(docs1, file):
     hashid = hash_file(file)
     if not isinstance(docs1, (list, tuple, types.GeneratorType)):
         docs1 = [docs1]
-    [x.metadata.update(dict(input_type=file_extension, date=str(datetime.now), hashid=hashid)) for x in docs1]
 def file_to_doc(file, base_path=None, verbose=False, fail_any_exception=False,
@@ -1038,6 +1045,10 @@ def file_to_doc(file, base_path=None, verbose=False, fail_any_exception=False,
         docs1 = UnstructuredWordDocumentLoader(file_path=file).load()
         add_meta(docs1, file)
         doc1 = chunk_sources(docs1, chunk=chunk, chunk_size=chunk_size)
     elif file.lower().endswith('.odt'):
         docs1 = UnstructuredODTLoader(file_path=file).load()
         add_meta(docs1, file)
@@ -1758,6 +1769,8 @@ def run_qa_db(**kwargs):
 def _run_qa_db(query=None,
                use_openai_model=False, use_openai_embedding=False,
                first_para=False, text_limit=None, top_k_docs=4, chunk=True, chunk_size=512,
                user_path=None,
@@ -1787,6 +1800,7 @@ def _run_qa_db(query=None,
                repetition_penalty=1.0,
                num_return_sequences=1,
                langchain_mode=None,
                document_choice=[DocumentChoices.All_Relevant.name],
                n_jobs=-1,
                verbose=False,
@@ -1803,7 +1817,7 @@ def _run_qa_db(query=None,
     :param use_openai_embedding:
     :param first_para:
     :param text_limit:
-    :param k:
     :param chunk:
     :param chunk_size:
     :param user_path: user path to glob recursively from
@@ -1869,12 +1883,28 @@ def _run_qa_db(query=None,
     sim_kwargs = {k: v for k, v in locals().items() if k in func_names}
     missing_kwargs = [x for x in func_names if x not in sim_kwargs]
     assert not missing_kwargs, "Missing: %s" % missing_kwargs
-    docs, chain, scores, use_context = get_similarity_chain(**sim_kwargs)
-    if cmd in [DocumentChoices.All_Relevant_Only_Sources.name, DocumentChoices.Only_All_Sources.name]:
         formatted_doc_chunks = '\n\n'.join([get_url(x) + '\n\n' + x.page_content for x in docs])
         yield formatted_doc_chunks, ''
         return
     if chain is None and model_name not in non_hf_types:
         # can only return if HF type
         return
@@ -1933,6 +1963,7 @@ def _run_qa_db(query=None,
 def get_similarity_chain(query=None,
                          use_openai_model=False, use_openai_embedding=False,
                          first_para=False, text_limit=None, top_k_docs=4, chunk=True, chunk_size=512,
                          user_path=None,
@@ -1947,6 +1978,7 @@ def get_similarity_chain(query=None,
                          load_db_if_exists=False,
                          db=None,
                          langchain_mode=None,
                          document_choice=[DocumentChoices.All_Relevant.name],
                          n_jobs=-1,
                          # beyond run_db_query:
@@ -1997,25 +2029,56 @@ def get_similarity_chain(query=None,
                                                         db=db,
                                                         n_jobs=n_jobs,
                                                         verbose=verbose)
-    if 'falcon' in model_name:
-        extra = "According to only the information in the document sources provided within the context above, "
-        prefix = "Pay attention and remember information below, which will help to answer the question or imperative after the context ends."
-    elif inference_server in ['openai', 'openai_chat']:
-        extra = "According to (primarily) the information in the document sources provided within context above, "
-        prefix = "Pay attention and remember information below, which will help to answer the question or imperative after the context ends.  If the answer cannot be primarily obtained from information within the context, then respond that the answer does not appear in the context of the documents."
-    else:
-        extra = ""
-        prefix = ""
-    if langchain_mode in ['Disabled', 'ChatLLM', 'LLM'] or not use_context:
-        template_if_no_docs = template = """%s{context}{question}""" % prefix
-    else:
-        template = """%s
-\"\"\"
-{context}
 \"\"\"
-%s{question}""" % (prefix, extra)
-        template_if_no_docs = """%s{context}%s{question}""" % (prefix, extra)
     if not use_openai_model and prompt_type not in ['plain'] or model_name in non_hf_types:
         use_template = True
     else:
@@ -2040,14 +2103,26 @@ def get_similarity_chain(query=None,
         if cmd == DocumentChoices.Just_LLM.name:
             docs = []
             scores = []
-        elif cmd == DocumentChoices.Only_All_Sources.name:
             db_documents, db_metadatas = get_docs_and_meta(db, top_k_docs, filter_kwargs=filter_kwargs)
             # similar to langchain's chroma's _results_to_docs_and_scores
             docs_with_score = [(Document(page_content=result[0], metadata=result[1] or {}), 0)
-                               for result in zip(db_documents, db_metadatas)][:top_k_docs]
             docs = [x[0] for x in docs_with_score]
             scores = [x[1] for x in docs_with_score]
         else:
             if top_k_docs == -1 or auto_reduce_chunks:
                 # docs_with_score = db.similarity_search_with_score(query, k=k_db, **filter_kwargs)[:top_k_docs]
                 top_k_docs_tokenize = 100
@@ -2120,6 +2195,7 @@ def get_similarity_chain(query=None,
             if reverse_docs:
                 docs_with_score.reverse()
             # cut off so no high distance docs/sources considered
             docs = [x[0] for x in docs_with_score if x[1] < cut_distanct]
             scores = [x[1] for x in docs_with_score if x[1] < cut_distanct]
             if len(scores) > 0 and verbose:
@@ -2131,14 +2207,14 @@ def get_similarity_chain(query=None,
     if not docs and use_context and model_name not in non_hf_types:
         # if HF type and have no docs, can bail out
-        return docs, None, [], False
-    if cmd in [DocumentChoices.All_Relevant_Only_Sources.name, DocumentChoices.Only_All_Sources.name]:
         # no LLM use
-        return docs, None, [], False
     common_words_file = "data/NGSL_1.2_stats.csv.zip"
-    if os.path.isfile(common_words_file):
         df = pd.read_csv("data/NGSL_1.2_stats.csv.zip")
         import string
         reduced_query = query.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).strip()
@@ -2155,25 +2231,47 @@ def get_similarity_chain(query=None,
         use_context = False
         template = template_if_no_docs
-    if use_template:
-        # instruct-like, rather than few-shot prompt_type='plain' as default
-        # but then sources confuse the model with how inserted among rest of text, so avoid
-        prompt = PromptTemplate(
-            # input_variables=["summaries", "question"],
-            input_variables=["context", "question"],
-            template=template,
-        )
-        chain = load_qa_chain(llm, prompt=prompt)
-    else:
-        chain = load_qa_with_sources_chain(llm)
-    if not use_context:
-        chain_kwargs = dict(input_documents=[], question=query)
     else:
-        chain_kwargs = dict(input_documents=docs, question=query)
-    target = wrapped_partial(chain, chain_kwargs)
-    return docs, target, scores, use_context
 def get_sources_answer(query, answer, scores, show_rank, answer_with_sources, verbose=False):
@@ -2243,6 +2341,11 @@ def chunk_sources(sources, chunk=True, chunk_size=512, language=None):
     splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=0, keep_separator=keep_separator,
                                               separators=separators)
     source_chunks = splitter.split_documents(sources)
     return source_chunks

 from langchain.embeddings import HuggingFaceInstructEmbeddings
 from tqdm import tqdm
+from enums import DocumentChoices, no_lora_str, model_token_mapping, source_prefix, source_postfix, non_query_commands, \
+    LangChainAction, LangChainMode
+from src.gen import gen_hyper, get_model, SEED
 from prompter import non_hf_types, PromptType, Prompter
 from utils import wrapped_partial, EThread, import_matplotlib, sanitize_filename, makedirs, get_url, flatten_list, \
     get_device, ProgressParallel, remove, hash_file, clear_torch_cache, NullContext, get_hf_server, FakeTokenizer
 from langchain.document_loaders import PyPDFLoader, TextLoader, CSVLoader, PythonLoader, TomlLoader, \
     UnstructuredURLLoader, UnstructuredHTMLLoader, UnstructuredWordDocumentLoader, UnstructuredMarkdownLoader, \
     EverNoteLoader, UnstructuredEmailLoader, UnstructuredODTLoader, UnstructuredPowerPointLoader, \
+    UnstructuredEPubLoader, UnstructuredImageLoader, UnstructuredRTFLoader, ArxivLoader, UnstructuredPDFLoader, \
+    UnstructuredExcelLoader
 from langchain.text_splitter import RecursiveCharacterTextSplitter, Language
 from langchain.chains.question_answering import load_qa_chain
 from langchain.docstore.document import Document
         stream_output = self.stream
         gr_client = self.client
         client_langchain_mode = 'Disabled'
+        client_langchain_action = LangChainAction.QUERY.value
         top_k_docs = 1
         chunk = True
         chunk_size = 512
                              instruction_nochat=prompt if not self.chat_client else '',
                              iinput_nochat='',  # only for chat=False
                              langchain_mode=client_langchain_mode,
+                             langchain_action=client_langchain_action,
                              top_k_docs=top_k_docs,
                              chunk=chunk,
                              chunk_size=chunk_size,
         callbacks = [StreamingGradioCallbackHandler()]
         assert prompter is not None
         stop_sequences = list(set(prompter.terminate_response + [prompter.PreResponse]))
+        stop_sequences = [x for x in stop_sequences if x]
         if gr_client:
             chat_client = False
         if stream_output:
             skip_prompt = False
+            from src.gen import H2OTextIteratorStreamer
             decoder_kwargs = {}
             streamer = H2OTextIteratorStreamer(tokenizer, skip_prompt=skip_prompt, block=False, **decoder_kwargs)
             gen_kwargs.update(dict(streamer=streamer))
 image_types = ["png", "jpg", "jpeg"]
 non_image_types = ["pdf", "txt", "csv", "toml", "py", "rst", "rtf",
+                   "md",
+                   "html", "mhtml",
                    "enex", "eml", "epub", "odt", "pptx", "ppt",
                    "zip", "urls",
                    ]
 # "msg",  GPL3
 if have_libreoffice:
+    non_image_types.extend(["docx", "doc", "xls", "xlsx"])
 file_types = non_image_types + image_types
     hashid = hash_file(file)
     if not isinstance(docs1, (list, tuple, types.GeneratorType)):
         docs1 = [docs1]
+    [x.metadata.update(dict(input_type=file_extension, date=str(datetime.now()), hashid=hashid)) for x in docs1]
 def file_to_doc(file, base_path=None, verbose=False, fail_any_exception=False,
         docs1 = UnstructuredWordDocumentLoader(file_path=file).load()
         add_meta(docs1, file)
         doc1 = chunk_sources(docs1, chunk=chunk, chunk_size=chunk_size)
+    elif (file.lower().endswith('.xlsx') or file.lower().endswith('.xls')) and have_libreoffice:
+        docs1 = UnstructuredExcelLoader(file_path=file).load()
+        add_meta(docs1, file)
+        doc1 = chunk_sources(docs1, chunk=chunk, chunk_size=chunk_size)
     elif file.lower().endswith('.odt'):
         docs1 = UnstructuredODTLoader(file_path=file).load()
         add_meta(docs1, file)
 def _run_qa_db(query=None,
+               iinput=None,
+               context=None,
                use_openai_model=False, use_openai_embedding=False,
                first_para=False, text_limit=None, top_k_docs=4, chunk=True, chunk_size=512,
                user_path=None,
                repetition_penalty=1.0,
                num_return_sequences=1,
                langchain_mode=None,
+               langchain_action=None,
                document_choice=[DocumentChoices.All_Relevant.name],
                n_jobs=-1,
                verbose=False,
     :param use_openai_embedding:
     :param first_para:
     :param text_limit:
+    :param top_k_docs:
     :param chunk:
     :param chunk_size:
     :param user_path: user path to glob recursively from
     sim_kwargs = {k: v for k, v in locals().items() if k in func_names}
     missing_kwargs = [x for x in func_names if x not in sim_kwargs]
     assert not missing_kwargs, "Missing: %s" % missing_kwargs
+    docs, chain, scores, use_context, have_any_docs = get_similarity_chain(**sim_kwargs)
+    if cmd in non_query_commands:
         formatted_doc_chunks = '\n\n'.join([get_url(x) + '\n\n' + x.page_content for x in docs])
         yield formatted_doc_chunks, ''
         return
+    if not docs and langchain_action in [LangChainAction.SUMMARIZE_MAP.value,
+                                         LangChainAction.SUMMARIZE_ALL.value,
+                                         LangChainAction.SUMMARIZE_REFINE.value]:
+        ret = 'No relevant documents to summarize.' if have_any_docs else 'No documents to summarize.'
+        extra = ''
+        yield ret, extra
+        return
+    if not docs and langchain_mode not in [LangChainMode.DISABLED.value,
+                                           LangChainMode.CHAT_LLM.value,
+                                           LangChainMode.LLM.value]:
+        ret = 'No relevant documents to query.' if have_any_docs else 'No documents to query.'
+        extra = ''
+        yield ret, extra
+        return
     if chain is None and model_name not in non_hf_types:
+        # here if no docs at all and not HF type
         # can only return if HF type
         return
 def get_similarity_chain(query=None,
+                         iinput=None,
                          use_openai_model=False, use_openai_embedding=False,
                          first_para=False, text_limit=None, top_k_docs=4, chunk=True, chunk_size=512,
                          user_path=None,
                          load_db_if_exists=False,
                          db=None,
                          langchain_mode=None,
+                         langchain_action=None,
                          document_choice=[DocumentChoices.All_Relevant.name],
                          n_jobs=-1,
                          # beyond run_db_query:
                                                         db=db,
                                                         n_jobs=n_jobs,
                                                         verbose=verbose)
+    have_any_docs = db is not None
+    if langchain_action == LangChainAction.QUERY.value:
+        if iinput:
+            query = "%s\n%s" % (query, iinput)
+        if 'falcon' in model_name:
+            extra = "According to only the information in the document sources provided within the context above, "
+            prefix = "Pay attention and remember information below, which will help to answer the question or imperative after the context ends."
+        elif inference_server in ['openai', 'openai_chat']:
+            extra = "According to (primarily) the information in the document sources provided within context above, "
+            prefix = "Pay attention and remember information below, which will help to answer the question or imperative after the context ends.  If the answer cannot be primarily obtained from information within the context, then respond that the answer does not appear in the context of the documents."
+        else:
+            extra = ""
+            prefix = ""
+        if langchain_mode in ['Disabled', 'ChatLLM', 'LLM'] or not use_context:
+            template_if_no_docs = template = """%s{context}{question}""" % prefix
+        else:
+            template = """%s
+    \"\"\"
+    {context}
+    \"\"\"
+    %s{question}""" % (prefix, extra)
+            template_if_no_docs = """%s{context}%s{question}""" % (prefix, extra)
+    elif langchain_action in [LangChainAction.SUMMARIZE_ALL.value, LangChainAction.SUMMARIZE_MAP.value]:
+        none = ['', '\n', None]
+        if query in none and iinput in none:
+            prompt_summary = "Using only the text above, write a condensed and concise summary:\n"
+        elif query not in none:
+            prompt_summary = "Focusing on %s, write a condensed and concise Summary:\n" % query
+        elif iinput not in None:
+            prompt_summary = iinput
+        else:
+            prompt_summary = "Focusing on %s, %s:\n" % (query, iinput)
+        # don't auto reduce
+        auto_reduce_chunks = False
+        if langchain_action == LangChainAction.SUMMARIZE_MAP.value:
+            fstring = '{text}'
+        else:
+            fstring = '{input_documents}'
+        template = """In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text:
 \"\"\"
+%s
+\"\"\"\n%s""" % (fstring, prompt_summary)
+        template_if_no_docs = "Exactly only say: There are no documents to summarize."
+    elif langchain_action in [LangChainAction.SUMMARIZE_REFINE]:
+        template = ''  # unused
+        template_if_no_docs = ''  # unused
+    else:
+        raise RuntimeError("No such langchain_action=%s" % langchain_action)
     if not use_openai_model and prompt_type not in ['plain'] or model_name in non_hf_types:
         use_template = True
     else:
         if cmd == DocumentChoices.Just_LLM.name:
             docs = []
             scores = []
+        elif cmd == DocumentChoices.Only_All_Sources.name or query in [None, '', '\n']:
             db_documents, db_metadatas = get_docs_and_meta(db, top_k_docs, filter_kwargs=filter_kwargs)
             # similar to langchain's chroma's _results_to_docs_and_scores
             docs_with_score = [(Document(page_content=result[0], metadata=result[1] or {}), 0)
+                               for result in zip(db_documents, db_metadatas)]
+            # order documents
+            doc_hashes = [x['doc_hash'] for x in db_metadatas]
+            doc_chunk_ids = [x['chunk_id'] for x in db_metadatas]
+            docs_with_score = [x for _, _, x in
+                               sorted(zip(doc_hashes, doc_chunk_ids, docs_with_score), key=lambda x: (x[0], x[1]))
+                               ]
+            docs_with_score = docs_with_score[:top_k_docs]
             docs = [x[0] for x in docs_with_score]
             scores = [x[1] for x in docs_with_score]
+            have_any_docs |= len(docs) > 0
         else:
+            # FIXME: if langchain_action == LangChainAction.SUMMARIZE_MAP.value
+            # if map_reduce, then no need to auto reduce chunks
             if top_k_docs == -1 or auto_reduce_chunks:
                 # docs_with_score = db.similarity_search_with_score(query, k=k_db, **filter_kwargs)[:top_k_docs]
                 top_k_docs_tokenize = 100
             if reverse_docs:
                 docs_with_score.reverse()
             # cut off so no high distance docs/sources considered
+            have_any_docs |= len(docs_with_score) > 0  # before cut
             docs = [x[0] for x in docs_with_score if x[1] < cut_distanct]
             scores = [x[1] for x in docs_with_score if x[1] < cut_distanct]
             if len(scores) > 0 and verbose:
     if not docs and use_context and model_name not in non_hf_types:
         # if HF type and have no docs, can bail out
+        return docs, None, [], False, have_any_docs
+    if cmd in non_query_commands:
         # no LLM use
+        return docs, None, [], False, have_any_docs
     common_words_file = "data/NGSL_1.2_stats.csv.zip"
+    if os.path.isfile(common_words_file) and langchain_mode == LangChainAction.QUERY.value:
         df = pd.read_csv("data/NGSL_1.2_stats.csv.zip")
         import string
         reduced_query = query.translate(str.maketrans(string.punctuation, ' ' * len(string.punctuation))).strip()
         use_context = False
         template = template_if_no_docs
+    if langchain_action == LangChainAction.QUERY.value:
+        if use_template:
+            # instruct-like, rather than few-shot prompt_type='plain' as default
+            # but then sources confuse the model with how inserted among rest of text, so avoid
+            prompt = PromptTemplate(
+                # input_variables=["summaries", "question"],
+                input_variables=["context", "question"],
+                template=template,
+            )
+            chain = load_qa_chain(llm, prompt=prompt)
+        else:
+            # only if use_openai_model = True, unused normally except in testing
+            chain = load_qa_with_sources_chain(llm)
+        if not use_context:
+            chain_kwargs = dict(input_documents=[], question=query)
+        else:
+            chain_kwargs = dict(input_documents=docs, question=query)
+        target = wrapped_partial(chain, chain_kwargs)
+    elif langchain_action in [LangChainAction.SUMMARIZE_MAP.value,
+                              LangChainAction.SUMMARIZE_REFINE,
+                              LangChainAction.SUMMARIZE_ALL.value]:
+        from langchain.chains.summarize import load_summarize_chain
+        if langchain_action == LangChainAction.SUMMARIZE_MAP.value:
+            prompt = PromptTemplate(input_variables=["text"], template=template)
+            chain = load_summarize_chain(llm, chain_type="map_reduce",
+                                         map_prompt=prompt, combine_prompt=prompt, return_intermediate_steps=True)
+            target = wrapped_partial(chain, {"input_documents": docs})  # , return_only_outputs=True)
+        elif langchain_action == LangChainAction.SUMMARIZE_ALL.value:
+            assert use_template
+            prompt = PromptTemplate(input_variables=["text"], template=template)
+            chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt, return_intermediate_steps=True)
+            target = wrapped_partial(chain)
+        elif langchain_action == LangChainAction.SUMMARIZE_REFINE.value:
+            chain = load_summarize_chain(llm, chain_type="refine", return_intermediate_steps=True)
+            target = wrapped_partial(chain)
+        else:
+            raise RuntimeError("No such langchain_action=%s" % langchain_action)
     else:
+        raise RuntimeError("No such langchain_action=%s" % langchain_action)
+    return docs, target, scores, use_context, have_any_docs
 def get_sources_answer(query, answer, scores, show_rank, answer_with_sources, verbose=False):
     splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=0, keep_separator=keep_separator,
                                               separators=separators)
     source_chunks = splitter.split_documents(sources)
+    # currently in order, but when pull from db won't be, so mark order and document by hash
+    doc_hash = str(uuid.uuid4())[:10]
+    [x.metadata.update(dict(doc_hash=doc_hash, chunk_id=chunk_id)) for chunk_id, x in enumerate(source_chunks)]
     return source_chunks

gradio_runner.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import copy
 import functools
 import inspect
@@ -49,16 +50,16 @@ def fix_pydantic_duplicate_validators_error():
 fix_pydantic_duplicate_validators_error()
-from enums import DocumentChoices, no_model_str, no_lora_str, no_server_str, LangChainMode
 from gradio_themes import H2oTheme, SoftTheme, get_h2o_title, get_simple_title, get_dark_js, spacing_xsm, radius_xsm, \
     text_xsm
 from prompter import prompt_type_to_model_name, prompt_types_strings, inv_prompt_type_to_model_lower, non_hf_types, \
     get_prompt
 from utils import get_githash, flatten_list, zip_data, s3up, clear_torch_cache, get_torch_allocated, system_info_print, \
     ping, get_short_name, get_url, makedirs, get_kwargs, remove, system_info, ping_gpu
-from generate import get_model, languages_covered, evaluate, eval_func_param_names, score_qa, langchain_modes, \
-    inputs_kwargs_list, scratch_base_dir, evaluate_from_str, no_default_param_names, \
-    eval_func_param_names_defaults, get_max_max_new_tokens, get_minmax_top_k_docs, history_to_context
 from apscheduler.schedulers.background import BackgroundScheduler
@@ -99,6 +100,7 @@ def go_gradio(**kwargs):
     dbs = kwargs['dbs']
     db_type = kwargs['db_type']
     visible_langchain_modes = kwargs['visible_langchain_modes']
     allow_upload_to_user_data = kwargs['allow_upload_to_user_data']
     allow_upload_to_my_data = kwargs['allow_upload_to_my_data']
     enable_sources_list = kwargs['enable_sources_list']
@@ -213,7 +215,28 @@ def go_gradio(**kwargs):
         'base_model') else no_model_msg
     output_label0_model2 = no_model_msg
     default_kwargs = {k: kwargs[k] for k in eval_func_param_names_defaults}
     for k in no_default_param_names:
         default_kwargs[k] = ''
@@ -239,7 +262,8 @@ def go_gradio(**kwargs):
         model_options_state = gr.State([model_options])
         lora_options_state = gr.State([lora_options])
         server_options_state = gr.State([server_options])
-        my_db_state = gr.State([None, None])
         chat_state = gr.State({})
         # make user default first and default choice, dedup
         docs_state00 = kwargs['document_choice'] + [x.name for x in list(DocumentChoices)]
@@ -332,6 +356,12 @@ def go_gradio(**kwargs):
                             value=kwargs['langchain_mode'],
                             label="Data Collection of Sources",
                             visible=kwargs['langchain_mode'] != 'Disabled')
                     data_row2 = gr.Row(visible=kwargs['langchain_mode'] != 'Disabled')
                     with data_row2:
                         with gr.Column(scale=50):
@@ -920,19 +950,59 @@ def go_gradio(**kwargs):
         for k in inputs_kwargs_list:
             assert k in kwargs_evaluate, "Missing %s" % k
-        def evaluate_gradio(*args1, **kwargs1):
-            for res_dict in evaluate(*args1, **kwargs1):
-                if kwargs['langchain_mode'] == 'Disabled':
-                    yield fix_text_for_gradio(res_dict['response'])
-                else:
-                    yield '<br>' + fix_text_for_gradio(res_dict['response'])
-        fun = partial(evaluate_gradio,
                       **kwargs_evaluate)
-        fun2 = partial(evaluate_gradio,
                        **kwargs_evaluate)
-        fun_with_dict_str = partial(evaluate_from_str,
-                                    default_kwargs=default_kwargs,
                                     **kwargs_evaluate
                                     )
@@ -1072,14 +1142,17 @@ def go_gradio(**kwargs):
             User that fills history for bot
             :param args:
             :param undo:
             :param sanitize_user_prompt:
-            :param model2:
             :return:
             """
             args_list = list(args)
             user_message = args_list[eval_func_param_names.index('instruction')]  # chat only
             input1 = args_list[eval_func_param_names.index('iinput')]  # chat only
             prompt_type1 = args_list[eval_func_param_names.index('prompt_type')]
             if not prompt_type1:
                 # shouldn't have to specify if CLI launched model
                 prompt_type1 = kwargs['prompt_type']
@@ -1110,8 +1183,12 @@ def go_gradio(**kwargs):
                     history[-1][1] = None
                 return history
             if user_message1 in ['', None, '\n']:
-                # reject non-retry submit/enter
-                return history
             user_message1 = fix_text_for_gradio(user_message1)
             return history + [[user_message1, None]]
@@ -1147,11 +1224,13 @@ def go_gradio(**kwargs):
             else:
                 return 2000
-        def prep_bot(*args, retry=False):
             """
             :param args:
             :param retry:
             :return: last element is True if should run bot, False if should just yield history
             """
             # don't deepcopy, can contain model itself
@@ -1159,12 +1238,16 @@ def go_gradio(**kwargs):
             model_state1 = args_list[-3]
             my_db_state1 = args_list[-2]
             history = args_list[-1]
-            langchain_mode1 = args_list[eval_func_param_names.index('langchain_mode')]
             if model_state1['model'] is None or model_state1['model'] == no_model_str:
                 return history, None, None, None
             args_list = args_list[:-3]  # only keep rest needed for evaluate()
             if not history:
                 print("No history", flush=True)
                 history = []
@@ -1175,22 +1258,23 @@ def go_gradio(**kwargs):
                 instruction1 = history[-1][0]
                 history[-1][1] = None
             elif not instruction1:
-                # if not retrying, then reject empty query
-                return history, None, None, None
             elif len(history) > 0 and history[-1][1] not in [None, '']:
                 # reject submit button if already filled and not retrying
                 # None when not filling with '' to keep client happy
                 return history, None, None, None
             # shouldn't have to specify in API prompt_type if CLI launched model, so prefer global CLI one if have it
-            prompt_type1 = kwargs.get('prompt_type', args_list[eval_func_param_names.index('prompt_type')])
-            # prefer model specific prompt type instead of global one, and apply back to args_list for evaluate()
-            args_list[eval_func_param_names.index('prompt_type')] = prompt_type1 = \
-                model_state1.get('prompt_type', prompt_type1)
-            prompt_dict1 = kwargs.get('prompt_dict', args_list[eval_func_param_names.index('prompt_dict')])
-            args_list[eval_func_param_names.index('prompt_dict')] = prompt_dict1 = \
-                model_state1.get('prompt_dict', prompt_dict1)
             chat1 = args_list[eval_func_param_names.index('chat')]
             model_max_length1 = get_model_max_length(model_state1)
@@ -1264,6 +1348,7 @@ def go_gradio(**kwargs):
                 for res in get_response(fun1, history):
                     yield res
             finally:
                 clear_embeddings(langchain_mode1, my_db_state1)
         def all_bot(*args, retry=False, model_states1=None):
@@ -1277,7 +1362,7 @@ def go_gradio(**kwargs):
             my_db_state1 = None  # will be filled below by some bot
             try:
                 gen_list = []
-                for chatbot1, model_state1 in zip(chatbots, model_states1):
                     args_list1 = args_list0.copy()
                     args_list1.insert(-1, model_state1)  # insert at -1 so is at -2
                     # if at start, have None in response still, replace with '' so client etc. acts like normal
@@ -1289,7 +1374,8 @@ def go_gradio(**kwargs):
                     # so consistent with prep_bot()
                     # with model_state1 at -3, my_db_state1 at -2, and history(chatbot) at -1
                     # langchain_mode1 and my_db_state1 should be same for every bot
-                    history, fun1, langchain_mode1, my_db_state1 = prep_bot(*tuple(args_list1), retry=retry)
                     gen1 = get_response(fun1, history)
                     if stream_output1:
                         gen1 = TimeoutIterator(gen1, timeout=0.01, sentinel=None, raise_on_exception=False)
@@ -1301,6 +1387,7 @@ def go_gradio(**kwargs):
                 tgen0 = time.time()
                 for res1 in itertools.zip_longest(*gen_list):
                     if time.time() - tgen0 > max_time1:
                         break
                     bots = [x[0] if x is not None and not isinstance(x, BaseException) else y for x, y in
@@ -1735,6 +1822,9 @@ def go_gradio(**kwargs):
         def load_model(model_name, lora_weights, server_name, model_state_old, prompt_type_old, load_8bit,
                        infer_devices, gpu_id):
             # ensure old model removed from GPU memory
             if kwargs['debug']:
                 print("Pre-switch pre-del GPU memory: %s" % get_torch_allocated(), flush=True)
@@ -2161,6 +2251,15 @@ def update_user_db(file, db1, x, y, *args, dbs=None, langchain_mode='UserData',
         clear_torch_cache()
 def _update_user_db(file, db1, x, y, chunk, chunk_size, dbs=None, db_type=None, langchain_mode='UserData',
                     user_path=None,
                     use_openai_embedding=None,
@@ -2222,7 +2321,8 @@ def _update_user_db(file, db1, x, y, chunk, chunk_size, dbs=None, db_type=None,
     exceptions = [x for x in sources if x.metadata.get('exception')]
     sources = [x for x in sources if 'exception' not in x.metadata]
-    with filelock.FileLock("db_%s.lock" % langchain_mode.replace(' ', '_')):
         if langchain_mode == 'MyData':
             if db1[0] is not None:
                 # then add
@@ -2235,18 +2335,14 @@ def _update_user_db(file, db1, x, y, chunk, chunk_size, dbs=None, db_type=None,
                 # for production hit, when user gets clicky:
                 assert len(db1) == 2, "Bad MyData db: %s" % db1
                 # then create
-                # assign fresh hash for this user session, so not shared
                 # if added has to original state and didn't change, then would be shared db for all users
-                db1[1] = str(uuid.uuid4())
                 persist_directory = os.path.join(scratch_base_dir, 'db_dir_%s_%s' % (langchain_mode, db1[1]))
                 db = get_db(sources, use_openai_embedding=use_openai_embedding,
                             db_type=db_type,
                             persist_directory=persist_directory,
                             langchain_mode=langchain_mode,
                             hf_embedding_model=hf_embedding_model)
-            if db is None:
-                db1[1] = None
-            else:
                 db1[0] = db
             source_files_added = get_source_files(db=db1[0], exceptions=exceptions)
             return None, langchain_mode, db1, x, y, source_files_added
@@ -2274,7 +2370,9 @@ def _update_user_db(file, db1, x, y, chunk, chunk_size, dbs=None, db_type=None,
 def get_db(db1, langchain_mode, dbs=None):
-    with filelock.FileLock("db_%s.lock" % langchain_mode.replace(' ', '_')):
         if langchain_mode in ['wiki_full']:
             # NOTE: avoid showing full wiki.  Takes about 30 seconds over about 90k entries, but not useful for now
             db = None

+import ast
 import copy
 import functools
 import inspect
 fix_pydantic_duplicate_validators_error()
+from enums import DocumentChoices, no_model_str, no_lora_str, no_server_str, LangChainAction, LangChainMode
 from gradio_themes import H2oTheme, SoftTheme, get_h2o_title, get_simple_title, get_dark_js, spacing_xsm, radius_xsm, \
     text_xsm
 from prompter import prompt_type_to_model_name, prompt_types_strings, inv_prompt_type_to_model_lower, non_hf_types, \
     get_prompt
 from utils import get_githash, flatten_list, zip_data, s3up, clear_torch_cache, get_torch_allocated, system_info_print, \
     ping, get_short_name, get_url, makedirs, get_kwargs, remove, system_info, ping_gpu
+from src.gen import get_model, languages_covered, evaluate, eval_func_param_names, score_qa, langchain_modes, \
+    inputs_kwargs_list, scratch_base_dir, no_default_param_names, \
+    eval_func_param_names_defaults, get_max_max_new_tokens, get_minmax_top_k_docs, history_to_context, langchain_actions
 from apscheduler.schedulers.background import BackgroundScheduler
     dbs = kwargs['dbs']
     db_type = kwargs['db_type']
     visible_langchain_modes = kwargs['visible_langchain_modes']
+    visible_langchain_actions = kwargs['visible_langchain_actions']
     allow_upload_to_user_data = kwargs['allow_upload_to_user_data']
     allow_upload_to_my_data = kwargs['allow_upload_to_my_data']
     enable_sources_list = kwargs['enable_sources_list']
         'base_model') else no_model_msg
     output_label0_model2 = no_model_msg
+    def update_prompt(prompt_type1, prompt_dict1, model_state1, which_model=0):
+        if not prompt_type1 or which_model != 0:
+            # keep prompt_type and prompt_dict in sync if possible
+            prompt_type1 = kwargs.get('prompt_type', prompt_type1)
+            prompt_dict1 = kwargs.get('prompt_dict', prompt_dict1)
+            # prefer model specific prompt type instead of global one
+            if not prompt_type1 or which_model != 0:
+                prompt_type1 = model_state1.get('prompt_type', prompt_type1)
+                prompt_dict1 = model_state1.get('prompt_dict', prompt_dict1)
+        if not prompt_dict1 or which_model != 0:
+            # if still not defined, try to get
+            prompt_dict1 = kwargs.get('prompt_dict', prompt_dict1)
+            if not prompt_dict1 or which_model != 0:
+                prompt_dict1 = model_state1.get('prompt_dict', prompt_dict1)
+        return prompt_type1, prompt_dict1
     default_kwargs = {k: kwargs[k] for k in eval_func_param_names_defaults}
+    # ensure prompt_type consistent with prep_bot(), so nochat API works same way
+    default_kwargs['prompt_type'], default_kwargs['prompt_dict'] = \
+        update_prompt(default_kwargs['prompt_type'], default_kwargs['prompt_dict'],
+                      model_state1=model_state0, which_model=0)
     for k in no_default_param_names:
         default_kwargs[k] = ''
         model_options_state = gr.State([model_options])
         lora_options_state = gr.State([lora_options])
         server_options_state = gr.State([server_options])
+        # uuid in db is used as user ID
+        my_db_state = gr.State([None, str(uuid.uuid4())])
         chat_state = gr.State({})
         # make user default first and default choice, dedup
         docs_state00 = kwargs['document_choice'] + [x.name for x in list(DocumentChoices)]
                             value=kwargs['langchain_mode'],
                             label="Data Collection of Sources",
                             visible=kwargs['langchain_mode'] != 'Disabled')
+                        allowed_actions = [x for x in langchain_actions if x in visible_langchain_actions]
+                        langchain_action = gr.Radio(
+                            allowed_actions,
+                            value=allowed_actions[0] if len(allowed_actions) > 0 else None,
+                            label="Data Action",
+                            visible=True)
                     data_row2 = gr.Row(visible=kwargs['langchain_mode'] != 'Disabled')
                     with data_row2:
                         with gr.Column(scale=50):
         for k in inputs_kwargs_list:
             assert k in kwargs_evaluate, "Missing %s" % k
+        def evaluate_nochat(*args1, default_kwargs1=None, str_api=False, **kwargs1):
+            args_list = list(args1)
+            if str_api:
+                user_kwargs = args_list[2]
+                assert isinstance(user_kwargs, str)
+                user_kwargs = ast.literal_eval(user_kwargs)
+            else:
+                user_kwargs = {k: v for k, v in zip(eval_func_param_names, args_list[2:])}
+            # only used for submit_nochat_api
+            user_kwargs['chat'] = False
+            if 'stream_output' not in user_kwargs:
+                user_kwargs['stream_output'] = False
+            if 'langchain_mode' not in user_kwargs:
+                # if user doesn't specify, then assume disabled, not use default
+                user_kwargs['langchain_mode'] = 'Disabled'
+            if 'langchain_action' not in user_kwargs:
+                user_kwargs['langchain_action'] = LangChainAction.QUERY.value
+            set1 = set(list(default_kwargs1.keys()))
+            set2 = set(eval_func_param_names)
+            assert set1 == set2, "Set diff: %s %s: %s" % (set1, set2, set1.symmetric_difference(set2))
+            # correct ordering.  Note some things may not be in default_kwargs, so can't be default of user_kwargs.get()
+            model_state1 = args_list[0]
+            my_db_state1 = args_list[1]
+            args_list = [user_kwargs[k] if k in user_kwargs and user_kwargs[k] is not None else default_kwargs1[k] for k
+                         in eval_func_param_names]
+            assert len(args_list) == len(eval_func_param_names)
+            args_list = [model_state1, my_db_state1] + args_list
+            try:
+                for res_dict in evaluate(*tuple(args_list), **kwargs1):
+                    if str_api:
+                        # full return of dict
+                        yield res_dict
+                    elif kwargs['langchain_mode'] == 'Disabled':
+                        yield fix_text_for_gradio(res_dict['response'])
+                    else:
+                        yield '<br>' + fix_text_for_gradio(res_dict['response'])
+            finally:
+                clear_torch_cache()
+                clear_embeddings(user_kwargs['langchain_mode'], my_db_state1)
+        fun = partial(evaluate_nochat,
+                      default_kwargs1=default_kwargs,
+                      str_api=False,
                       **kwargs_evaluate)
+        fun2 = partial(evaluate_nochat,
+                       default_kwargs1=default_kwargs,
+                       str_api=False,
                        **kwargs_evaluate)
+        fun_with_dict_str = partial(evaluate_nochat,
+                                    default_kwargs1=default_kwargs,
+                                    str_api=True,
                                     **kwargs_evaluate
                                     )
             User that fills history for bot
             :param args:
             :param undo:
+            :param retry:
             :param sanitize_user_prompt:
             :return:
             """
             args_list = list(args)
             user_message = args_list[eval_func_param_names.index('instruction')]  # chat only
             input1 = args_list[eval_func_param_names.index('iinput')]  # chat only
             prompt_type1 = args_list[eval_func_param_names.index('prompt_type')]
+            langchain_mode1 = args_list[eval_func_param_names.index('langchain_mode')]
+            langchain_action1 = args_list[eval_func_param_names.index('langchain_action')]
+            document_choice1 = args_list[eval_func_param_names.index('document_choice')]
             if not prompt_type1:
                 # shouldn't have to specify if CLI launched model
                 prompt_type1 = kwargs['prompt_type']
                     history[-1][1] = None
                 return history
             if user_message1 in ['', None, '\n']:
+                if langchain_action1 in LangChainAction.QUERY.value and \
+                        DocumentChoices.Only_All_Sources.name not in document_choice1 \
+                        or \
+                        langchain_mode1 in [LangChainMode.CHAT_LLM.value, LangChainMode.LLM.value]:
+                    # reject non-retry submit/enter
+                    return history
             user_message1 = fix_text_for_gradio(user_message1)
             return history + [[user_message1, None]]
             else:
                 return 2000
+        def prep_bot(*args, retry=False, which_model=0):
             """
             :param args:
             :param retry:
+            :param which_model: identifies which model if doing model_lock
+                 API only called for which_model=0, default for inputs_list, but rest should ignore inputs_list
             :return: last element is True if should run bot, False if should just yield history
             """
             # don't deepcopy, can contain model itself
             model_state1 = args_list[-3]
             my_db_state1 = args_list[-2]
             history = args_list[-1]
+            prompt_type1 = args_list[eval_func_param_names.index('prompt_type')]
+            prompt_dict1 = args_list[eval_func_param_names.index('prompt_dict')]
             if model_state1['model'] is None or model_state1['model'] == no_model_str:
                 return history, None, None, None
             args_list = args_list[:-3]  # only keep rest needed for evaluate()
+            langchain_mode1 = args_list[eval_func_param_names.index('langchain_mode')]
+            langchain_action1 = args_list[eval_func_param_names.index('langchain_action')]
+            document_choice1 = args_list[eval_func_param_names.index('document_choice')]
             if not history:
                 print("No history", flush=True)
                 history = []
                 instruction1 = history[-1][0]
                 history[-1][1] = None
             elif not instruction1:
+                if langchain_action1 in LangChainAction.QUERY.value and \
+                        DocumentChoices.Only_All_Sources.name not in document_choice1 \
+                        or \
+                        langchain_mode1 in [LangChainMode.CHAT_LLM.value, LangChainMode.LLM.value]:
+                    # if not retrying, then reject empty query
+                    return history, None, None, None
             elif len(history) > 0 and history[-1][1] not in [None, '']:
                 # reject submit button if already filled and not retrying
                 # None when not filling with '' to keep client happy
                 return history, None, None, None
             # shouldn't have to specify in API prompt_type if CLI launched model, so prefer global CLI one if have it
+            prompt_type1, prompt_dict1 = update_prompt(prompt_type1, prompt_dict1, model_state1,
+                                                       which_model=which_model)
+            # apply back to args_list for evaluate()
+            args_list[eval_func_param_names.index('prompt_type')] = prompt_type1
+            args_list[eval_func_param_names.index('prompt_dict')] = prompt_dict1
             chat1 = args_list[eval_func_param_names.index('chat')]
             model_max_length1 = get_model_max_length(model_state1)
                 for res in get_response(fun1, history):
                     yield res
             finally:
+                clear_torch_cache()
                 clear_embeddings(langchain_mode1, my_db_state1)
         def all_bot(*args, retry=False, model_states1=None):
             my_db_state1 = None  # will be filled below by some bot
             try:
                 gen_list = []
+                for chatboti, (chatbot1, model_state1) in enumerate(zip(chatbots, model_states1)):
                     args_list1 = args_list0.copy()
                     args_list1.insert(-1, model_state1)  # insert at -1 so is at -2
                     # if at start, have None in response still, replace with '' so client etc. acts like normal
                     # so consistent with prep_bot()
                     # with model_state1 at -3, my_db_state1 at -2, and history(chatbot) at -1
                     # langchain_mode1 and my_db_state1 should be same for every bot
+                    history, fun1, langchain_mode1, my_db_state1 = prep_bot(*tuple(args_list1), retry=retry,
+                                                                            which_model=chatboti)
                     gen1 = get_response(fun1, history)
                     if stream_output1:
                         gen1 = TimeoutIterator(gen1, timeout=0.01, sentinel=None, raise_on_exception=False)
                 tgen0 = time.time()
                 for res1 in itertools.zip_longest(*gen_list):
                     if time.time() - tgen0 > max_time1:
+                        print("Took too long: %s" % max_time1, flush=True)
                         break
                     bots = [x[0] if x is not None and not isinstance(x, BaseException) else y for x, y in
         def load_model(model_name, lora_weights, server_name, model_state_old, prompt_type_old, load_8bit,
                        infer_devices, gpu_id):
+            # ensure no API calls reach here
+            if is_public:
+                raise RuntimeError("Illegal access for %s" % model_name)
             # ensure old model removed from GPU memory
             if kwargs['debug']:
                 print("Pre-switch pre-del GPU memory: %s" % get_torch_allocated(), flush=True)
         clear_torch_cache()
+def get_lock_file(db1, langchain_mode):
+    assert len(db1) == 2 and db1[1] is not None and isinstance(db1[1], str)
+    user_id = db1[1]
+    base_path = 'locks'
+    makedirs(base_path)
+    lock_file = "db_%s_%s.lock" % (langchain_mode.replace(' ', '_'), user_id)
+    return lock_file
 def _update_user_db(file, db1, x, y, chunk, chunk_size, dbs=None, db_type=None, langchain_mode='UserData',
                     user_path=None,
                     use_openai_embedding=None,
     exceptions = [x for x in sources if x.metadata.get('exception')]
     sources = [x for x in sources if 'exception' not in x.metadata]
+    lock_file = get_lock_file(db1, langchain_mode)
+    with filelock.FileLock(lock_file):
         if langchain_mode == 'MyData':
             if db1[0] is not None:
                 # then add
                 # for production hit, when user gets clicky:
                 assert len(db1) == 2, "Bad MyData db: %s" % db1
                 # then create
                 # if added has to original state and didn't change, then would be shared db for all users
                 persist_directory = os.path.join(scratch_base_dir, 'db_dir_%s_%s' % (langchain_mode, db1[1]))
                 db = get_db(sources, use_openai_embedding=use_openai_embedding,
                             db_type=db_type,
                             persist_directory=persist_directory,
                             langchain_mode=langchain_mode,
                             hf_embedding_model=hf_embedding_model)
+            if db is not None:
                 db1[0] = db
             source_files_added = get_source_files(db=db1[0], exceptions=exceptions)
             return None, langchain_mode, db1, x, y, source_files_added
 def get_db(db1, langchain_mode, dbs=None):
+    lock_file = get_lock_file(db1, langchain_mode)
+    with filelock.FileLock(lock_file):
         if langchain_mode in ['wiki_full']:
             # NOTE: avoid showing full wiki.  Takes about 30 seconds over about 90k entries, but not useful for now
             db = None

gradio_utils/__pycache__/css.cpython-310.pyc DELETED Viewed

Binary file (1.53 kB)

gradio_utils/__pycache__/grclient.cpython-310.pyc DELETED Viewed

Binary file (2.69 kB)

gradio_utils/__pycache__/prompt_form.cpython-310.pyc DELETED Viewed

Binary file (3.59 kB)

gradio_utils/css.py DELETED Viewed

@@ -1,53 +0,0 @@
-def get_css(kwargs) -> str:
-    if kwargs['h2ocolors']:
-        css_code = """footer {visibility: hidden;}
-        body{background:linear-gradient(#f5f5f5,#e5e5e5);}
-        body.dark{background:linear-gradient(#000000,#0d0d0d);}
-        """
-    else:
-        css_code = """footer {visibility: hidden}"""
-    css_code += make_css_base()
-    return css_code
-def make_css_base() -> str:
-    return """
-    @import url('https://fonts.googleapis.com/css2?family=Source+Sans+Pro:wght@400;600&display=swap');
-    body.dark{#warning {background-color: #555555};}
-    #small_btn {
-        margin: 0.6em 0em 0.55em 0;
-        max-width: 20em;
-        min-width: 5em !important;
-        height: 5em;
-        font-size: 14px !important;
-    }
-    #prompt-form {
-        border: 1px solid var(--primary-500) !important;
-    }
-    #prompt-form.block {
-        border-radius: var(--block-radius) !important;
-    }
-    #prompt-form textarea {
-        border: 1px solid rgb(209, 213, 219);
-    }
-    #prompt-form label > div {
-        margin-top: 4px;
-    }
-    button.primary:hover {
-        background-color: var(--primary-600) !important;
-        transition: .2s;
-    }
-    #prompt-form-area {
-        margin-bottom: 2.5rem;
-    }
-    .chatsmall chatbot {font-size: 10px !important}
-    """

gradio_utils/grclient.py DELETED Viewed

@@ -1,82 +0,0 @@
-import traceback
-from typing import Callable
-import os
-from gradio_client.client import Job
-os.environ['HF_HUB_DISABLE_TELEMETRY'] = '1'
-from gradio_client import Client
-class GradioClient(Client):
-    """
-    Parent class of gradio client
-    To handle automatically refreshing client if detect gradio server changed
-    """
-    def __init__(self, *args, **kwargs):
-        self.args = args
-        self.kwargs = kwargs
-        super().__init__(*args, **kwargs)
-        self.server_hash = self.get_server_hash()
-    def get_server_hash(self):
-        """
-        Get server hash using super without any refresh action triggered
-        Returns: git hash of gradio server
-        """
-        return super().submit(api_name='/system_hash').result()
-    def refresh_client_if_should(self):
-        # get current hash in order to update api_name -> fn_index map in case gradio server changed
-        # FIXME: Could add cli api as hash
-        server_hash = self.get_server_hash()
-        if self.server_hash != server_hash:
-            self.refresh_client()
-            self.server_hash = server_hash
-        else:
-            self.reset_session()
-    def refresh_client(self):
-        """
-        Ensure every client call is independent
-        Also ensure map between api_name and fn_index is updated in case server changed (e.g. restarted with new code)
-        Returns:
-        """
-        # need session hash to be new every time, to avoid "generator already executing"
-        self.reset_session()
-        client = Client(*self.args, **self.kwargs)
-        for k, v in client.__dict__.items():
-            setattr(self, k, v)
-    def submit(
-        self,
-        *args,
-        api_name: str | None = None,
-        fn_index: int | None = None,
-        result_callbacks: Callable | list[Callable] | None = None,
-    ) -> Job:
-        # Note predict calls submit
-        try:
-            self.refresh_client_if_should()
-            job = super().submit(*args, api_name=api_name, fn_index=fn_index)
-        except Exception as e:
-            print("Hit e=%s" % str(e), flush=True)
-            # force reconfig in case only that
-            self.refresh_client()
-            job = super().submit(*args, api_name=api_name, fn_index=fn_index)
-        # see if immediately failed
-        e = job.future._exception
-        if e is not None:
-            print("GR job failed: %s %s" % (str(e), ''.join(traceback.format_tb(e.__traceback__))), flush=True)
-            # force reconfig in case only that
-            self.refresh_client()
-            job = super().submit(*args, api_name=api_name, fn_index=fn_index)
-            e2 = job.future._exception
-            if e2 is not None:
-                print("GR job failed again: %s\n%s" % (str(e2), ''.join(traceback.format_tb(e2.__traceback__))), flush=True)
-        return job

gradio_utils/prompt_form.py DELETED Viewed

@@ -1,118 +0,0 @@
-import os
-import math
-import gradio as gr
-def make_chatbots(output_label0, output_label0_model2, **kwargs):
-    text_outputs = []
-    chat_kwargs = []
-    for model_state_lock in kwargs['model_states']:
-        if os.environ.get('DEBUG_MODEL_LOCK'):
-            model_name = model_state_lock["base_model"] + " : " + model_state_lock["inference_server"]
-        else:
-            model_name = model_state_lock["base_model"]
-        output_label = f'h2oGPT [{model_name}]'
-        min_width = 250 if kwargs['gradio_size'] in ['small', 'large', 'medium'] else 160
-        chat_kwargs.append(dict(label=output_label, visible=kwargs['model_lock'], elem_classes='chatsmall',
-                                height=kwargs['height'] or 400, min_width=min_width))
-    if kwargs['model_lock_columns'] == -1:
-        kwargs['model_lock_columns'] = len(kwargs['model_states'])
-    if kwargs['model_lock_columns'] is None:
-        kwargs['model_lock_columns'] = 3
-    ncols = kwargs['model_lock_columns']
-    if kwargs['model_states'] == 0:
-        nrows = 0
-    else:
-        nrows = math.ceil(len(kwargs['model_states']) / kwargs['model_lock_columns'])
-    if kwargs['model_lock_columns'] == 0:
-        # not using model_lock
-        pass
-    elif nrows <= 1:
-        with gr.Row():
-            for chat_kwargs1, model_state_lock in zip(chat_kwargs, kwargs['model_states']):
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-    elif nrows == kwargs['model_states']:
-        with gr.Row():
-            for chat_kwargs1, model_state_lock in zip(chat_kwargs, kwargs['model_states']):
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-    elif nrows == 2:
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii >= len(kwargs['model_states']) / 2:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii < len(kwargs['model_states']) / 2:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-    elif nrows == 3:
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii >= 1 * len(kwargs['model_states']) / 3:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii < 1 * len(kwargs['model_states']) / 3 or mii >= 2 * len(kwargs['model_states']) / 3:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii < 2 * len(kwargs['model_states']) / 3:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-    elif nrows >= 4:
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii >= 1 * len(kwargs['model_states']) / 4:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii < 1 * len(kwargs['model_states']) / 4 or mii >= 2 * len(kwargs['model_states']) / 4:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii < 2 * len(kwargs['model_states']) / 4 or mii >= 3 * len(kwargs['model_states']) / 4:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-        with gr.Row():
-            for mii, (chat_kwargs1, model_state_lock) in enumerate(zip(chat_kwargs, kwargs['model_states'])):
-                if mii < 3 * len(kwargs['model_states']) / 4:
-                    continue
-                text_outputs.append(gr.Chatbot(**chat_kwargs1))
-    with gr.Row():
-        text_output = gr.Chatbot(label=output_label0, visible=not kwargs['model_lock'], height=kwargs['height'] or 400)
-        text_output2 = gr.Chatbot(label=output_label0_model2,
-                                  visible=False and not kwargs['model_lock'], height=kwargs['height'] or 400)
-    return text_output, text_output2, text_outputs
-def make_prompt_form(kwargs):
-    if kwargs['input_lines'] > 1:
-        instruction_label = "Shift-Enter to Submit, Enter for more lines"
-    else:
-        instruction_label = "Enter to Submit, Shift-Enter for more lines"
-    with gr.Row():#elem_id='prompt-form-area'):
-        with gr.Column(scale=50):
-            instruction = gr.Textbox(
-                lines=kwargs['input_lines'],
-                label='Ask anything',
-                placeholder=instruction_label,
-                info=None,
-                elem_id='prompt-form',
-                container=True,
-            )
-        with gr.Row():
-            submit = gr.Button(value='Submit', variant='primary', scale=0, size='sm')
-            stop_btn = gr.Button(value="Stop", variant='secondary', scale=0, size='sm')
-    return instruction, submit, stop_btn

h2o-logo.svg DELETED Viewed

h2oai_pipeline.py CHANGED Viewed

@@ -136,6 +136,7 @@ class H2OTextGenerationPipeline(TextGenerationPipeline):
             else:
                 outputs = rec['generated_text']
             rec['generated_text'] = outputs
         return records
     def _forward(self, model_inputs, **generate_kwargs):

             else:
                 outputs = rec['generated_text']
             rec['generated_text'] = outputs
+            print("prompt: %s\noutputs: %s\n\n" % (self.prompt_text, outputs), flush=True)
         return records
     def _forward(self, model_inputs, **generate_kwargs):

iterators/__init__.py DELETED Viewed

@@ -1,4 +0,0 @@
-from .timeout_iterator import TimeoutIterator, AsyncTimeoutIterator
-from .iterator_pipe import IteratorPipe, AsyncIteratorPipe
-__all__ = ["TimeoutIterator", "AsyncTimeoutIterator", "IteratorPipe", "AsyncIteratorPipe"]

iterators/__pycache__/__init__.cpython-310.pyc DELETED Viewed

Binary file (337 Bytes)

iterators/__pycache__/iterator_pipe.cpython-310.pyc DELETED Viewed

Binary file (2.71 kB)

iterators/__pycache__/timeout_iterator.cpython-310.pyc DELETED Viewed

Binary file (5.63 kB)

iterators/iterator_pipe.py DELETED Viewed

@@ -1,93 +0,0 @@
-import queue
-import asyncio
-class IteratorPipe:
-    """
-    Iterator Pipe creates an iterator that can be fed in data from another block of code or thread of execution
-    """
-    def __init__(self, sentinel=object()):
-        self._q = queue.Queue()
-        self._sentinel = sentinel
-        self._sentinel_pushed = False
-        self._closed = False
-    def __iter__(self):
-        return self
-    def __next__(self):
-        if self._closed:
-            raise StopIteration
-        data = self._q.get(block=True)
-        if data is self._sentinel:
-            self._closed = True
-            raise StopIteration
-        return data
-    def put(self, data) -> bool:
-        """
-        Pushes next item to Iterator and returns True
-        If iterator has been closed via close(), doesn't push anything and returns False
-        """
-        if self._sentinel_pushed:
-            return False
-        self._q.put(data)
-        return True
-    def close(self):
-        """
-        Close is idempotent. Calling close multiple times is safe
-        Iterator will raise StopIteration only after all elements pushed before close have been iterated
-        """
-        # make close idempotent
-        if not self._sentinel_pushed:
-            self._sentinel_pushed = True
-        self._q.put(self._sentinel)
-class AsyncIteratorPipe:
-    def __init__(self, sentinel=object()):
-        self._q = asyncio.Queue()
-        self._sentinel = sentinel
-        self._sentinel_pushed = False
-        self._closed = False
-    def __aiter__(self):
-        return self
-    async def __anext__(self):
-        if self._closed:
-            raise StopAsyncIteration
-        data = await self._q.get()
-        if data is self._sentinel:
-            self._closed = True
-            raise StopAsyncIteration
-        return data
-    async def put(self, data) -> bool:
-        """
-        Pushes next item to Iterator and returns True
-        If iterator has been closed via close(), doesn't push anything and returns False
-        """
-        if self._sentinel_pushed:
-            return False
-        await self._q.put(data)
-        return True
-    async def close(self):
-        """
-        Close is idempotent. Calling close multiple times is safe
-        Iterator will raise StopIteration only after all elements pushed before close have been iterated
-        """
-        # make close idempotent
-        if not self._sentinel_pushed:
-            self._sentinel_pushed = True
-            await self._q.put(self._sentinel)

iterators/timeout_iterator.py DELETED Viewed

@@ -1,170 +0,0 @@
-import queue
-import asyncio
-import threading
-import traceback
-class TimeoutIterator:
-    """
-    Wrapper class to add timeout feature to synchronous iterators
-    - timeout: timeout for next(). Default=ZERO_TIMEOUT i.e. no timeout or blocking calls to next. Updated using set_timeout()
-    - sentinel: the object returned by iterator when timeout happens
-    - reset_on_next: if set to True, timeout is reset to the value of ZERO_TIMEOUT on each iteration
-    TimeoutIterator uses a thread internally.
-    The thread stops once the iterator exhausts or raises an exception during iteration.
-    Any exceptions raised within the wrapped iterator are propagated as it is.
-    Exception is raised when all elements generated by the actual iterator before exception have been consumed
-    Timeout can be set dynamically before going for iteration
-    """
-    ZERO_TIMEOUT = 0.0
-    def __init__(self, iterator, timeout=0.0, sentinel=object(), reset_on_next=False, raise_on_exception=True):
-        self._iterator = iterator
-        self._timeout = timeout
-        self._sentinel = sentinel
-        self._reset_on_next = reset_on_next
-        self._raise_on_exception = raise_on_exception
-        self._interrupt = False
-        self._done = False
-        self._buffer = queue.Queue()
-        self._thread = threading.Thread(target=self.__lookahead)
-        self._thread.start()
-    def get_sentinel(self):
-        return self._sentinel
-    def set_reset_on_next(self, reset_on_next):
-        self._reset_on_next = reset_on_next
-    def set_timeout(self, timeout: float):
-        """
-        Set timeout for next iteration
-        """
-        self._timeout = timeout
-    def interrupt(self):
-        """
-        interrupt and stop the underlying thread.
-        the thread acutally dies only after interrupt has been set and
-        the underlying iterator yields a value after that.
-        """
-        self._interrupt = True
-    def __iter__(self):
-        return self
-    def __next__(self):
-        """
-        yield the result from iterator
-        if timeout > 0:
-            yield data if available.
-            otherwise yield sentinal
-        """
-        if self._done:
-            raise StopIteration
-        data = self._sentinel
-        try:
-            if self._timeout > self.ZERO_TIMEOUT:
-                data = self._buffer.get(timeout=self._timeout)
-            else:
-                data = self._buffer.get()
-        except queue.Empty:
-            pass
-        finally:
-            # see if timeout needs to be reset
-            if self._reset_on_next:
-                self._timeout = self.ZERO_TIMEOUT
-        # propagate any exceptions including StopIteration
-        if isinstance(data, BaseException):
-            self._done = True
-            if isinstance(data, StopIteration):
-                raise data
-            ex = ''.join(traceback.format_tb(data.__traceback__))
-            print("Generation Failed: %s %s" % (str(data), str(ex)), flush=True)
-            if self._raise_on_exception:
-                raise data
-            else:
-                return data
-        return data
-    def __lookahead(self):
-        try:
-            while True:
-                self._buffer.put(next(self._iterator))
-                if self._interrupt:
-                    raise StopIteration()
-        except BaseException as e:
-            self._buffer.put(e)
-class AsyncTimeoutIterator:
-    """
-    Async version of TimeoutIterator. See method documentation of TimeoutIterator
-    """
-    ZERO_TIMEOUT = 0.0
-    def __init__(self, iterator, timeout=0.0, sentinel=object(), reset_on_next=False):
-        self._iterator = iterator
-        self._timeout = timeout
-        self._sentinel = sentinel
-        self._reset_on_next = reset_on_next
-        self._interrupt = False
-        self._done = False
-        self._buffer = asyncio.Queue()
-        self._task = asyncio.get_event_loop().create_task(self.__lookahead())
-    def get_sentinel(self):
-        return self._sentinel
-    def set_reset_on_next(self, reset_on_next):
-        self._reset_on_next = reset_on_next
-    def set_timeout(self, timeout: float):
-        self._timeout = timeout
-    def interrupt(self):
-        self._interrupt = True
-    def __aiter__(self):
-        return self
-    async def __anext__(self):
-        if self._done:
-            raise StopAsyncIteration
-        data = self._sentinel
-        try:
-            if self._timeout > self.ZERO_TIMEOUT:
-                data = await asyncio.wait_for(self._buffer.get(), self._timeout)
-            else:
-                data = await self._buffer.get()
-        except asyncio.TimeoutError:
-            pass
-        finally:
-            # see if timeout needs to be reset
-            if self._reset_on_next:
-                self._timeout = self.ZERO_TIMEOUT
-        # propagate any exceptions including StopIteration
-        if isinstance(data, BaseException):
-            self._done = True
-            raise data
-        return data
-    async def __lookahead(self):
-        try:
-            while True:
-                data = await self._iterator.__anext__()
-                await self._buffer.put(data)
-                if self._interrupt:
-                    raise StopAsyncIteration()
-        except BaseException as e:
-            await self._buffer.put(e)

prompter.py CHANGED Viewed

@@ -120,7 +120,7 @@ def get_prompt(prompt_type, prompt_dict, chat, context, reduced, making_context,
     elif prompt_type in [PromptType.custom.value, str(PromptType.custom.value),
                          PromptType.custom.name]:
         promptA = prompt_dict.get('promptA', '')
-        promptB = prompt_dict('promptB', '')
         PreInstruct = prompt_dict.get('PreInstruct', '')
         PreInput = prompt_dict.get('PreInput', '')
         PreResponse = prompt_dict.get('PreResponse', '')
@@ -693,7 +693,9 @@ class Prompter(object):
                 output = clean_response(output)
             elif prompt is None:
                 # then use most basic parsing like pipeline
-                if self.botstr in output:
                     if self.humanstr:
                         output = clean_response(output.split(self.botstr)[1].split(self.humanstr)[0])
                     else:

     elif prompt_type in [PromptType.custom.value, str(PromptType.custom.value),
                          PromptType.custom.name]:
         promptA = prompt_dict.get('promptA', '')
+        promptB = prompt_dict.get('promptB', '')
         PreInstruct = prompt_dict.get('PreInstruct', '')
         PreInput = prompt_dict.get('PreInput', '')
         PreResponse = prompt_dict.get('PreResponse', '')
                 output = clean_response(output)
             elif prompt is None:
                 # then use most basic parsing like pipeline
+                if not self.botstr:
+                    pass
+                elif self.botstr in output:
                     if self.humanstr:
                         output = clean_response(output.split(self.botstr)[1].split(self.humanstr)[0])
                     else:

requirements.txt CHANGED Viewed

@@ -1,153 +0,0 @@
-# for generate (gradio server) and finetune
-datasets==2.13.0
-sentencepiece==0.1.99
-gradio==3.35.2
-huggingface_hub==0.15.1
-appdirs==1.4.4
-fire==0.5.0
-docutils==0.20.1
-torch==2.0.1
-evaluate==0.4.0
-rouge_score==0.1.2
-sacrebleu==2.3.1
-scikit-learn==1.2.2
-alt-profanity-check==1.2.2
-better-profanity==0.7.0
-numpy==1.24.3
-pandas==2.0.2
-matplotlib==3.7.1
-loralib==0.1.1
-bitsandbytes==0.39.0
-accelerate==0.20.3
-git+https://github.com/huggingface/peft.git@0b62b4378b4ce9367932c73540349da9a41bdea8
-transformers==4.30.2
-tokenizers==0.13.3
-APScheduler==3.10.1
-# optional for generate
-pynvml==11.5.0
-psutil==5.9.5
-boto3==1.26.101
-botocore==1.29.101
-# optional for finetune
-tensorboard==2.13.0
-neptune==1.2.0
-# for gradio client
-gradio_client==0.2.7
-beautifulsoup4==4.12.2
-markdown==3.4.3
-# data and testing
-pytest==7.2.2
-pytest-xdist==3.2.1
-nltk==3.8.1
-textstat==0.7.3
-# pandoc==2.3
-#pypandoc==1.11
-pypandoc_binary==1.11
-openpyxl==3.1.2
-lm_dataformat==0.0.20
-bioc==2.0
-# falcon
-einops==0.6.1
-instructorembedding==1.0.1
-# for gpt4all .env file, but avoid worrying about imports
-python-dotenv==1.0.0
-text-generation==0.6.0
-# for tokenization when don't have HF tokenizer
-tiktoken==0.4.0
-# optional: for OpenAI endpoint or embeddings (requires key)
-openai==0.27.8
-# optional for chat with PDF
-langchain==0.0.202
-pypdf==3.9.1
-# avoid textract, requires old six
-#textract==1.6.5
-# for HF embeddings
-sentence_transformers==2.2.2
-# local vector db
-chromadb==0.3.25
-# server vector db
-#pymilvus==2.2.8
-# weak url support, if can't install opencv etc. If comment-in this one, then comment-out unstructured[local-inference]==0.6.6
-# unstructured==0.6.6
-# strong support for images
-# Requires on Ubuntu: sudo apt-get install libmagic-dev poppler-utils tesseract-ocr libreoffice
-unstructured[local-inference]==0.7.4
-#pdf2image==1.16.3
-#pytesseract==0.3.10
-pillow
-pdfminer.six==20221105
-urllib3
-requests_file
-#pdf2image==1.16.3
-#pytesseract==0.3.10
-tabulate==0.9.0
-# FYI pandoc already part of requirements.txt
-# JSONLoader, but makes some trouble for some users
-# jq==1.4.1
-# to check licenses
-# Run: pip-licenses|grep -v 'BSD\|Apache\|MIT'
-pip-licenses==4.3.0
-# weaviate vector db
-weaviate-client==3.20.0
-# optional for chat with PDF
-langchain==0.0.202
-pypdf==3.9.1
-# avoid textract, requires old six
-#textract==1.6.5
-# for HF embeddings
-sentence_transformers==2.2.2
-# local vector db
-chromadb==0.3.25
-# server vector db
-#pymilvus==2.2.8
-# weak url support, if can't install opencv etc. If comment-in this one, then comment-out unstructured[local-inference]==0.6.6
-# unstructured==0.6.6
-# strong support for images
-# Requires on Ubuntu: sudo apt-get install libmagic-dev poppler-utils tesseract-ocr libreoffice
-unstructured[local-inference]==0.7.4
-#pdf2image==1.16.3
-#pytesseract==0.3.10
-pillow
-pdfminer.six==20221105
-urllib3
-requests_file
-#pdf2image==1.16.3
-#pytesseract==0.3.10
-tabulate==0.9.0
-# FYI pandoc already part of requirements.txt
-# JSONLoader, but makes some trouble for some users
-# jq==1.4.1
-# to check licenses
-# Run: pip-licenses|grep -v 'BSD\|Apache\|MIT'
-pip-licenses==4.3.0
-# weaviate vector db
-weaviate-client==3.20.0
-faiss-gpu==1.7.2
-arxiv==1.4.7
-pymupdf==1.22.3 # AGPL license
-# extract-msg==0.41.1  # GPL3