hqsiswiliam
/

SPT

Model card Files Files and versions Community

hqsiswiliam commited on Jun 26, 2024

Commit

8359bb1

•

1 Parent(s): 824dd1d

Upload 43 files

Browse files

Files changed (43) hide show

Figures/Exp.png +0 -0
Figures/SelectivePromptTuning-SPT.png +0 -0
README.md +112 -0
config/all_values.yml +39 -0
config/convai2/llama2-7b-selective-linear-both-prompt-causal-convai2-adding-target-noise.yml +34 -0
config/convai2/llama2-7b-selective-linear-both-prompt-causal-convai2.yml +33 -0
config/convai2/opt-1.3b-selective-linear-both-prompt-causal-convai2.yml +33 -0
config/convai2/opt-125m-selective-linear-both-prompt-causal-convai2.yml +33 -0
config/convai2/opt-2.7b-selective-linear-both-prompt-causal-convai2.yml +33 -0
config/default.yml +16 -0
dataset/__pycache__/dataset.cpython-310.pyc +0 -0
dataset/__pycache__/dataset_helper.cpython-310.pyc +0 -0
dataset/dataset.py +189 -0
dataset/dataset_helper.py +117 -0
ds_config.json +28 -0
env.yml +257 -0
evaluate_runs_results.py +150 -0
evaluation.py +92 -0
interactive_test.py +205 -0
models/__pycache__/llm_chat.cpython-310.pyc +0 -0
models/__pycache__/selective_llm_chat.cpython-310.pyc +0 -0
models/llm_chat.py +227 -0
models/selective_llm_chat.py +390 -0
test.py +204 -0
train.py +129 -0
trainer/__init__.py +1 -0
trainer/__pycache__/__init__.cpython-310.pyc +0 -0
trainer/__pycache__/peft_trainer.cpython-310.pyc +0 -0
trainer/peft_trainer.py +187 -0
utils/__pycache__/config.cpython-310.pyc +0 -0
utils/__pycache__/configure_optimizers.cpython-310.pyc +0 -0
utils/__pycache__/dist_helper.cpython-310.pyc +0 -0
utils/__pycache__/format_inputs.cpython-310.pyc +0 -0
utils/__pycache__/model_helpers.cpython-310.pyc +0 -0
utils/__pycache__/parser_helper.cpython-310.pyc +0 -0
utils/__pycache__/seed_everything.cpython-310.pyc +0 -0
utils/config.py +50 -0
utils/configure_optimizers.py +6 -0
utils/dist_helper.py +5 -0
utils/format_inputs.py +173 -0
utils/model_helpers.py +31 -0
utils/parser_helper.py +17 -0
utils/seed_everything.py +44 -0

Figures/Exp.png ADDED Viewed

Figures/SelectivePromptTuning-SPT.png ADDED Viewed

README.md ADDED Viewed

	@@ -0,0 +1,112 @@

+# SPT: Selective Prompting Tuning for Personalized Conversations with LLMs
+Repo for `Selective Prompting Tuning for Personalized Conversations with LLMs`, the paper is available at: [Selective Prompting Tuning for Personalized Conversations with LLMs](https://openreview.net/pdf?id=Royo7My_EJ)
+## Introduction
+In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to yield responses that are similar to the ground truths in datasets, while direct fine-tuning tends to produce repetitive or overly generic replies. To alleviate those issues, we propose **S**elective **P**rompt **T**uning (SPT), which softly prompts LLMs for personalized conversations in a selective way. Concretely, SPT initializes a set of soft prompts and uses a trainable dense retriever to adaptively select suitable soft prompts for LLMs according to different input contexts, where the prompt retriever is dynamically updated through feedback from the LLMs. Additionally, we propose context-prompt contrastive learning and prompt fusion learning to encourage the SPT to enhance the diversity of personalized conversations. Experiments on the CONVAI2 dataset demonstrate that SPT significantly enhances response diversity by up to 90\%, along with improvements in other critical performance indicators. Those results highlight the efficacy of SPT in fostering engaging and personalized dialogue generation. The SPT model code is publicly available for further exploration.
+## Architecture
+![spt-arch](Figures/SelectivePromptTuning-SPT.png)
+## Experimental Results
+![exp](Figures/Exp.png)
+## Repo Details
+### Basic Project Structure
+- `config`: contains all the configuration yml file from OPT-125M to Llama2-13B
+- `data_file`: contains CONVAI2 dataset files, dataset can be donwloaded in this [Huggingface Repo](https://huggingface.co/hqsiswiliam/SPT)
+- `dataset`: contains dataloader class and the pre-process methods
+- `models`: contains SPT model classes
+- `trainer`: contains trainer classes, responsible for model training & updating
+- `utils`: provides helper classes and functions
+- `test.py`: the entrance script for model decoding
+- `train.py`: the entrance script for model training
+### Checkpoint downloading
+- The trained checkpoint is located in `public_ckpt` from [Huggingface Repo](https://huggingface.co/hqsiswiliam/SPT)
+### Environment Initialization
+#### Modifying `env.yml`
+Since Deepspeed requires the CuDNN and CUDA, and we integrated Nvidia related tools in Anancoda, so it is essential to modify `env.yml`'s instance variable in the last two lines as:
+```yml
+variables:
+  LD_LIBRARY_PATH: <CONDA_PATH>/envs/SPT/lib
+  LIBRARY_PATH: <CONDA_PATH>/envs/SPT/lib
+```
+Please replace `<CONDA_PATH>` to your own actual conda installation path before importing the `env.yml` to your environment.
+#### Environment Creation
+The SPT's environment can be built using Anaconda (which we recommend), we provide the env.yml for environment creation:
+```bash
+conda env create -f env.yml
+```
+```bash
+conda activate SPT
+```
+## Model Training
+Using following command to start training:
+```bash
+deepspeed --num_nodes=1 train.py \
+--config=config/convai2/opt-125m-selective-linear-both-prompt-causal-convai2.yml \
+--batch=2 \
+--lr=0.0001 \
+--epoch=1 \
+--save_model=yes \
+--num_workers=0 \
+--training_ratio=1.0 \
+--log_dir=runs_ds_dev \
+--deepspeed \
+--deepspeed_config ds_config.json
+```
+You can adjust `--num_nodes` if you have multiple GPUs in one node
+### Main Arguments
+- `config`: the training configuration file
+- `batch`: the batch size per GPU
+- `lr`: learning rate
+- `epoch`: epoch number
+- `save_model`: whether to save model
+- `training_ratio`: the percentage of data used for training, 1.0 means 100%
+- `log_dir`: the log and model save directory
+- `deepspeed & --deepspeed_config`: the necessary arguments for initialize deepspeed
+- `selective_loss_weight`: weight for selection loss
+- `contrastive_weight`: weight for contrastive loss
+## Model Inference
+Model inference can be easily invoked by using the following command:
+```bash
+deepspeed test.py \
+--model_path=public_ckpt/OPT-125M-SPT \
+--batch_size=16 \
+--skip_exists=no \
+--deepspeed \
+--deepspeed_config ds_config.json
+```
+### Main Arguments
+- `model_path`: the path to the checkpoint, containing the `ds_ckpt` folder
+- `skip_exists`: whether to skip decoding if `evaluation_result.txt` exists
+## Computing Metrics for Generation Results
+To compute the metric for the evaluation results, simply run:
+`python evaluate_runs_results.py`
+The input path can be changed in the script via:
+```python
+_main_path = 'public_ckpt'
+```
+## Interactive Testing
+Also, we support interactive testing via:
+```bash
+deepspeed interactive_test.py \
+--model_path=public_ckpt/Llama2-7B-SPT \
+--batch_size=1 \
+--deepspeed \
+--deepspeed_config ds_config.json
+```
+So an interactive interface will be invoked as:
+Some shortcut keys:
+- `exit`: exiting the interactive shell
+- `clear`: clear the current dialog history
+- `r`: reload SPT's persona
+## Citation
+Will be available soon.

config/all_values.yml ADDED Viewed

	@@ -0,0 +1,39 @@

+model:
+  model_type: 'selective_pt'
+  model_name: "facebook/opt-125m"
+  load_bit: 32
+  peft_type: "prompt_tuning"
+  K: 4
+  peft_config:
+    num_virtual_tokens: 8
+  normalizer: linear
+  normalizer_on: ['prompt', 'lm']
+  retriever:
+    retriever_on: ['extra', 'lm']
+    retriever_type: transformer_encoder
+    n_head: 4
+    num_layers: 2
+training:
+  learning_rate: 1e-5
+  batch_size: 32
+  num_epochs: 1
+  mode: causal
+  only_longest: True
+  task_type: generate_response
+  log_dir: runs_prompt_selective_linear
+  contrastive: true
+  ensemble: true
+  selective_loss_weight: 0.4
+  contrastive_metric: bleu
+  contrastive_threshold: 20.0
+  contrastive_weight: 0.4
+  freeze_persona: yes
+  freeze_context: yes
+dataset:
+  train: data_file/ConvAI2/train_self_original_no_cands.txt
+  valid: data_file/ConvAI2/valid_self_original_no_cands.txt
+  max_context_turns: -1
+  max_token_length: 512

config/convai2/llama2-7b-selective-linear-both-prompt-causal-convai2-adding-target-noise.yml ADDED Viewed

	@@ -0,0 +1,34 @@

+model:
+  model_type: 'selective_pt'
+  model_name: "Llama-2-7b-chat-hf"
+  load_bit: 16
+  peft_type: "prompt_tuning"
+  K: 4
+  peft_config:
+    num_virtual_tokens: 1
+  normalizer: linear
+  normalizer_on: ['prompt', 'lm']
+training:
+  learning_rate: 1e-5
+  batch_size: 32
+  num_epochs: 1
+  mode: causal
+  adding_noise: 0.1
+  only_longest: False
+  task_type: generate_response
+  log_dir: runs_prompt_convai2_selective_linear
+  contrastive: true
+  ensemble: true
+  selective_loss_weight: 1.0
+  contrastive_metric: bleu
+  contrastive_threshold: 20.0
+  contrastive_weight: 1.0
+  freeze_persona: yes
+  freeze_context: yes
+dataset:
+  train: data_file/ConvAI2/train_self_original_no_cands.txt
+  valid: data_file/ConvAI2/valid_self_original_no_cands.txt
+  max_context_turns: -1
+  max_token_length: 512

config/convai2/llama2-7b-selective-linear-both-prompt-causal-convai2.yml ADDED Viewed

	@@ -0,0 +1,33 @@

+model:
+  model_type: 'selective_pt'
+  model_name: "Llama-2-7b-chat-hf"
+  load_bit: 16
+  peft_type: "prompt_tuning"
+  K: 4
+  peft_config:
+    num_virtual_tokens: 1
+  normalizer: linear
+  normalizer_on: ['prompt', 'lm']
+training:
+  learning_rate: 1e-5
+  batch_size: 32
+  num_epochs: 1
+  mode: causal
+  only_longest: False
+  task_type: generate_response
+  log_dir: runs_prompt_convai2_selective_linear
+  contrastive: true
+  ensemble: true
+  selective_loss_weight: 1.0
+  contrastive_metric: bleu
+  contrastive_threshold: 20.0
+  contrastive_weight: 1.0
+  freeze_persona: yes
+  freeze_context: yes
+dataset:
+  train: data_file/ConvAI2/train_self_original_no_cands.txt
+  valid: data_file/ConvAI2/valid_self_original_no_cands.txt
+  max_context_turns: -1
+  max_token_length: 512

config/convai2/opt-1.3b-selective-linear-both-prompt-causal-convai2.yml ADDED Viewed

	@@ -0,0 +1,33 @@

+model:
+  model_type: 'selective_pt'
+  model_name: "facebook/opt-1.3b"
+  load_bit: 32
+  peft_type: "prompt_tuning"
+  K: 4
+  peft_config:
+    num_virtual_tokens: 8
+  normalizer: linear
+  normalizer_on: ['prompt', 'lm']
+training:
+  learning_rate: 1e-5
+  batch_size: 32
+  num_epochs: 1
+  mode: causal
+  only_longest: False
+  task_type: generate_response
+  log_dir: runs_prompt_convai2_selective_linear
+  contrastive: true
+  ensemble: true
+  selective_loss_weight: 0.4
+  contrastive_metric: bleu
+  contrastive_threshold: 20.0
+  contrastive_weight: 0.4
+  freeze_persona: yes
+  freeze_context: yes
+dataset:
+  train: data_file/ConvAI2/train_self_original_no_cands.txt
+  valid: data_file/ConvAI2/valid_self_original_no_cands.txt
+  max_context_turns: -1
+  max_token_length: 512

config/convai2/opt-125m-selective-linear-both-prompt-causal-convai2.yml ADDED Viewed

	@@ -0,0 +1,33 @@

+model:
+  model_type: 'selective_pt'
+  model_name: "facebook/opt-125m"
+  load_bit: 32
+  peft_type: "prompt_tuning"
+  K: 4
+  peft_config:
+    num_virtual_tokens: 8
+  normalizer: linear
+  normalizer_on: ['prompt', 'lm']
+training:
+  learning_rate: 1e-5
+  batch_size: 32
+  num_epochs: 1
+  mode: causal
+  only_longest: False
+  task_type: generate_response
+  log_dir: runs_prompt_convai2_selective_linear
+  contrastive: true
+  ensemble: true
+  selective_loss_weight: 0.4
+  contrastive_metric: bleu
+  contrastive_threshold: 20.0
+  contrastive_weight: 0.4
+  freeze_persona: yes
+  freeze_context: yes
+dataset:
+  train: data_file/ConvAI2/train_self_original_no_cands.txt
+  valid: data_file/ConvAI2/valid_self_original_no_cands.txt
+  max_context_turns: -1
+  max_token_length: 512

config/convai2/opt-2.7b-selective-linear-both-prompt-causal-convai2.yml ADDED Viewed

	@@ -0,0 +1,33 @@

+model:
+  model_type: 'selective_pt'
+  model_name: "facebook/opt-2.7b"
+  load_bit: 16
+  peft_type: "prompt_tuning"
+  K: 4
+  peft_config:
+    num_virtual_tokens: 8
+  normalizer: linear
+  normalizer_on: ['prompt', 'lm']
+training:
+  learning_rate: 1e-5
+  batch_size: 32
+  num_epochs: 1
+  mode: causal
+  only_longest: False
+  task_type: generate_response
+  log_dir: runs_prompt_convai2_selective_linear
+  contrastive: true
+  ensemble: true
+  selective_loss_weight: 0.4
+  contrastive_metric: bleu
+  contrastive_threshold: 20.0
+  contrastive_weight: 0.4
+  freeze_persona: yes
+  freeze_context: yes
+dataset:
+  train: data_file/ConvAI2/train_self_original_no_cands.txt
+  valid: data_file/ConvAI2/valid_self_original_no_cands.txt
+  max_context_turns: -1
+  max_token_length: 512

config/default.yml ADDED Viewed

	@@ -0,0 +1,16 @@

+dataset:
+  train: data_file/ConvAI2/train_self_original_no_cands.txt
+  valid: data_file/ConvAI2/valid_self_original_no_cands.txt
+  max_context_turns: -1
+  max_token_length: 512
+model:
+  score_activation: 'softplus'
+training:
+  mode: normal
+  only_longest: False
+  task_type: generate_response
+  ensemble: false
+  tau_gold: 1.0
+  tau_sim: 1.0

dataset/__pycache__/dataset.cpython-310.pyc ADDED Viewed

Binary file (5.37 kB). View file

dataset/__pycache__/dataset_helper.cpython-310.pyc ADDED Viewed

Binary file (3.46 kB). View file

dataset/dataset.py ADDED Viewed

	@@ -0,0 +1,189 @@

+import torch
+from pytorch_lightning import LightningDataModule
+from torch.utils.data import DataLoader
+from dataset.dataset_helper import read_personachat_split
+from utils.format_inputs import TASK_TYPE
+class PersonaChatDataset(torch.utils.data.Dataset):
+    # longest first for batch finder
+    def __init__(self, data_path, max_context_turns=-1,
+                 add_role_indicator=True, only_longest=False, training_ratio=1.0,
+                 task_type=TASK_TYPE.GENERATE_RESPONSE):
+        self.path = data_path
+        self.add_role_indicator = add_role_indicator
+        self.max_context_turns = max_context_turns
+        self.turns_data = read_personachat_split(data_path, only_longest=only_longest)
+        self.only_longest = only_longest
+        self.training_ratio = training_ratio
+        if training_ratio < 1.0:
+            self.turns_data = self.turns_data[:int(len(self.turns_data) * training_ratio)]
+        self.task_type = task_type
+        # # For debug only
+        # os.makedirs("data_logs", exist_ok=True)
+        # random_num = random.randint(0, 100000)
+        # self.file = open(f"data_logs/{random_num}_{data_path.split(os.sep)[-1]}", 'w')
+        # # add id to turns_data
+        # self.turns_data = [{'id': idx, **turn} for idx, turn in enumerate(self.turns_data)]
+        # self.file.write(f"total_turns: {len(self.turns_data)}\n")
+    def sort_longest_first(self):
+        self.turns_data = sorted(self.turns_data, key=lambda x: len(
+            (' '.join(x['persona']) + ' '.join(x['context']) + x['response']).split(' ')), reverse=True)
+    def __getitem__(self, idx):
+        # self.file.write(str(idx) + "\n")
+        # self.file.flush()
+        input_data = self.turns_data[idx]
+        persona_list = input_data['persona']
+        target = input_data['response']
+        context_input = input_data['context']
+        if self.add_role_indicator:
+            roled_context_input = [['Q: ', 'R: '][c_idx % 2] + context for c_idx, context in enumerate(context_input)]
+            context_input = roled_context_input
+        if self.max_context_turns != -1:
+            truncated_context = context_input[-(self.max_context_turns * 2 - 1):]
+            context_input = truncated_context
+        if self.only_longest:
+            context_input = context_input[:-1]
+        return {
+            'context_input': context_input,
+            'persona_list': persona_list,
+            'target': target
+        }
+    def __len__(self):
+        return len(self.turns_data)
+# class HGPersonaChatDataset(PersonaChatDataset):
+#     def __init__(self, data_path, max_context_turns=-1,
+#                  add_role_indicator=True, only_longest=False, tokenizer=None):
+#         super().__init__(data_path, max_context_turns, add_role_indicator, only_longest)
+#         self.tokenizer = tokenizer
+#
+#     def __getitem__(self, idx):
+#         data = super().__getitem__(idx)
+#         input = "P: " + ' '.join(data['persona_list']) + " C: " + ' '.join(data['context_input']) + " R: " + data[
+#             'target']
+#         tokenized = self.tokenizer(input)
+#         return {**data, **tokenized}
+def collate_fn(sample_list):
+    dont_be_a_tensor = ['context_input', 'persona_list', 'target']
+    to_be_flattened = [*dont_be_a_tensor]
+    data = {}
+    for key in to_be_flattened:
+        if key not in sample_list[0].keys():
+            continue
+        if sample_list[0][key] is None:
+            continue
+        flatten_samples = [sample[key] for sample in sample_list]
+        if flatten_samples[-1].__class__ == str or key in dont_be_a_tensor:
+            data[key] = flatten_samples
+        else:
+            data[key] = torch.tensor(flatten_samples)
+    return data
+def collate_fn_straight(sample_list):
+    sample_list = collate_fn(sample_list)
+    return sample_list
+def collate_fn_straight_with_fn(fn):
+    def build_collate_fn(sample_list):
+        sample_list = collate_fn(sample_list)
+        sample_list_processed = fn(sample_list)
+        return {**sample_list, **sample_list_processed}
+    return build_collate_fn
+def get_dataloader(dataset, batch_size, shuffle=False, num_workers=None, collate_fn=None, sampler=None):
+    if num_workers is None:
+        num_workers = batch_size // 4
+    # num_workers = min(num_workers, batch_size)
+    if collate_fn == None:
+        _collate_fn = collate_fn_straight
+    else:
+        _collate_fn = collate_fn_straight_with_fn(collate_fn)
+    return DataLoader(dataset, batch_size=batch_size,
+                      collate_fn=_collate_fn,
+                      shuffle=shuffle,
+                      num_workers=num_workers,
+                      sampler=sampler)
+def get_lightening_dataloader(dataset, batch_size, shuffle=False, num_workers=None):
+    return LitDataModule(batch_size, dataset, shuffle, num_workers)
+class LitDataModule(LightningDataModule):
+    def __init__(self, batch_size, dataset, shuffle, num_workers):
+        super().__init__()
+        self.save_hyperparameters(ignore=['dataset'])
+        # or
+        self.batch_size = batch_size
+        self.dataset = dataset
+    def train_dataloader(self):
+        return DataLoader(self.dataset, batch_size=self.batch_size,
+                          collate_fn=collate_fn_straight,
+                          shuffle=self.hparams.shuffle,
+                          num_workers=self.hparams.num_workers)
+if __name__ == '__main__':
+    import json
+    train_ds = PersonaChatDataset(data_path='data_file/ConvAI2/train_self_original_no_cands.txt',
+                                  )
+    from tqdm import tqdm
+    jsonfy_data = []
+    for data in tqdm(train_ds):
+        context_input = "\n".join(data['context_input'])
+        persona_input = '\n'.join(data['persona_list'])
+        jsonfy_data.append({
+            "instruction": f"""Given the dialog history between Q and R is:
+{context_input}
+Given the personality of the R as:
+{persona_input}
+Please response to Q according to both the dialog history and the R's personality.
+Now, the R would say:""",
+            "input": "",
+            "output": data['target'],
+            "answer": "",
+        })
+    with open('data_file/train.json', 'w') as writer:
+        json.dump(jsonfy_data, writer)
+    jsonfy_data = []
+    del train_ds
+    train_ds = PersonaChatDataset(data_path='data_file/ConvAI2/valid_self_original_no_cands.txt',
+                                  )
+    for data in tqdm(train_ds):
+        context_input = "\n".join(data['context_input'])
+        persona_input = '\n'.join(data['persona_list'])
+        jsonfy_data.append({
+            "instruction": f"""Given the dialog history between Q and R is:
+{context_input}
+Given the personality of the R as:
+{persona_input}
+Please response to Q according to both the dialog history and the R's personality.
+Now, the R would say:""",
+            "input": "",
+            "output": data['target'],
+            "answer": "",
+        })
+    with open('data_file/valid.json', 'w') as writer:
+        json.dump(jsonfy_data, writer)
+    with open('data_file/test.json', 'w') as writer:
+        json.dump(jsonfy_data, writer)

dataset/dataset_helper.py ADDED Viewed

	@@ -0,0 +1,117 @@

+import re
+from tqdm import tqdm
+def read_personachat_split(split_dir, only_longest=False):
+    results = []
+    their_per_group = None
+    try:
+        file = open(split_dir, 'r')
+        lines = file.readlines()
+        persona = []
+        context = []
+        response = None
+        candidates = []
+        is_longest = False
+        for line in tqdm(lines[:], desc='loading {}'.format(split_dir)):
+            if line.startswith('1 your persona:'):
+                is_longest = True
+                if is_longest and only_longest:
+                    if response is not None:
+                        results.append({'persona': persona.copy(), 'context': context.copy(), 'response': response,
+                                    'candidates': candidates.copy()})
+                    is_longest = False
+                persona = []
+                context = []
+            if 'persona:' in line:
+                persona.append(line.split(':')[1].strip())
+            if 'persona:' not in line:
+                context.append(re.sub(r"^\d+ ", "", line.split("\t")[0].strip()))
+                response = line.split("\t")[1].strip()
+                if len(line.split("\t\t"))==1:
+                    candidates = []
+                else:
+                    candidates = line.split("\t\t")[1].strip().split("|")
+                if not only_longest:
+                    results.append({'persona': persona.copy(), 'context': context.copy(), 'response': response,
+                                    'candidates': candidates.copy()})
+                context.append(response)
+    except FileNotFoundError:
+        print(f"Sorry! The file {split_dir} can't be found.")
+    return results
+def combine_persona_query_response(persona, query, response, candidates):
+    assert ((len(persona) == len(query)) and (len(query) == len(response))), \
+        'the length of persona, query, response must be equivalent'
+    data = {}
+    for index, psn in enumerate(persona):
+        split_persona = psn.strip().split("\t")
+        psn = psn.replace("\t", " ").strip()
+        if psn not in data.keys():
+            data[psn] = {'persona': psn, 'query': [], 'response': [], 'dialog': [], 'response_turns': 0,
+                         'persona_list': split_persona, 'candidates': []}
+        data[psn]['query'].append(query[index])
+        data[psn]['response'].append(response[index])
+        data[psn]['dialog'].append(query[index])
+        data[psn]['dialog'].append(response[index])
+        data[psn]['candidates'].append(candidates[index])
+        data[psn]['response_turns'] += 1
+    return data
+def preprocess_text(text):
+    punctuations = '.,?'
+    for punc in punctuations:
+        text = text.replace(punc, ' {} '.format(punc))
+    text = re.sub(' +', ' ', text).strip()
+    return text
+def preprocess_texts(text_array):
+    return [preprocess_text(t) for t in text_array]
+# "turns" means we need at least how many turns
+# "max_context_turns" means how many history turns should be kept
+def get_chat_by_turns(combined_data, turns=1,
+                      sep_token='[SEP]', add_role_indicator=True,
+                      add_persona_indicator=True, max_context_turns=-1):
+    assert turns > 0, 'turns must be large than 0'
+    all_persona = list(combined_data.keys())
+    filtered_persona = list(filter(lambda p: combined_data[p]['response_turns'] >= turns, all_persona))
+    data = []
+    for single_persona in filtered_persona:
+        single_persona_data = combined_data[single_persona]
+        persona_list = single_persona_data['persona_list']
+        context = []
+        for index, (query, response) in enumerate(
+                zip(single_persona_data['query'], single_persona_data['response'])
+        ):
+            if max_context_turns != -1 and \
+                    index + 1 < single_persona_data['response_turns'] - max_context_turns:
+                continue
+            if add_role_indicator:
+                query = "Q: {}".format(query)
+                if not index + 1 >= turns:
+                    response = "R: {}".format(response)
+            context += [query, response]
+            if index + 1 >= turns:
+                break
+        response = context[-1]
+        context = context[:-1]
+        input_x_str = " {} ".format(sep_token).join(context)
+        input_x_str = re.sub(" +", " ", input_x_str)
+        if add_persona_indicator:
+            single_persona = "P: {}".format(single_persona)
+        data.append({'input': preprocess_texts(context),
+                     'input_str': preprocess_text(input_x_str),
+                     'target': preprocess_text(response),
+                     'persona': preprocess_text(single_persona),
+                     'persona_list': preprocess_texts(persona_list),
+                     'candidates': preprocess_texts(single_persona_data['candidates'][-1])})
+    return data

ds_config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "train_micro_batch_size_per_gpu ": 1,
+  "gradient_accumulation_steps": 1,
+  "optimizer": {
+    "type": "Adam",
+    "params": {
+      "lr": 0.00015
+    }
+  },
+  "bf16": {
+    "enabled": false
+  },
+  "float16": {
+    "enabled": false
+  },
+  "zero_optimization": {
+    "stage": 2,
+    "offload_param": {
+      "device": "cpu",
+      "pin_memory": true,
+      "buffer_count": 5,
+      "buffer_size": 1e8,
+      "max_in_cpu": 1e9
+    }
+  }
+}

env.yml ADDED Viewed

	@@ -0,0 +1,257 @@

+name: SPT
+channels:
+  - pytorch
+  - nvidia
+  - nvidia/label/cuda-11.8.0
+  - anaconda
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - blas=1.0=mkl
+  - brotlipy=0.7.0=py310h7f8727e_1002
+  - bzip2=1.0.8=h7b6447c_0
+  - ca-certificates=2023.08.22=h06a4308_0
+  - certifi=2023.11.17=py310h06a4308_0
+  - cffi=1.15.1=py310h5eee18b_3
+  - charset-normalizer=2.0.4=pyhd3eb1b0_0
+  - cryptography=39.0.1=py310h9ce1e76_2
+  - cuda-cccl=11.8.89=0
+  - cuda-compiler=11.8.0=0
+  - cuda-cudart=11.8.89=0
+  - cuda-cudart-dev=11.8.89=0
+  - cuda-cuobjdump=11.8.86=0
+  - cuda-cupti=11.8.87=0
+  - cuda-cuxxfilt=11.8.86=0
+  - cuda-libraries=11.8.0=0
+  - cuda-nvcc=11.8.89=0
+  - cuda-nvprune=11.8.86=0
+  - cuda-nvrtc=11.8.89=0
+  - cuda-nvtx=11.8.86=0
+  - cuda-runtime=11.8.0=0
+  - cudatoolkit=11.8.0=h6a678d5_0
+  - ffmpeg=4.3=hf484d3e_0
+  - filelock=3.9.0=py310h06a4308_0
+  - freetype=2.12.1=h4a9f257_0
+  - giflib=5.2.1=h5eee18b_3
+  - gmp=6.2.1=h295c915_3
+  - gmpy2=2.1.2=py310heeb90bb_0
+  - gnutls=3.6.15=he1e5248_0
+  - idna=3.4=py310h06a4308_0
+  - intel-openmp=2023.1.0=hdb19cb5_46305
+  - jinja2=3.1.2=py310h06a4308_0
+  - jpeg=9e=h5eee18b_1
+  - lame=3.100=h7b6447c_0
+  - lcms2=2.12=h3be6417_0
+  - ld_impl_linux-64=2.38=h1181459_1
+  - lerc=3.0=h295c915_0
+  - libcublas=11.11.3.6=0
+  - libcublas-dev=11.11.3.6=0
+  - libcufft=10.9.0.58=0
+  - libcufile=1.6.1.9=0
+  - libcurand=10.3.2.106=0
+  - libcusolver=11.4.1.48=0
+  - libcusolver-dev=11.4.1.48=0
+  - libcusparse=11.7.5.86=0
+  - libcusparse-dev=11.7.5.86=0
+  - libdeflate=1.17=h5eee18b_0
+  - libffi=3.4.4=h6a678d5_0
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libiconv=1.16=h7f8727e_2
+  - libidn2=2.3.4=h5eee18b_0
+  - libnpp=11.8.0.86=0
+  - libnvjpeg=11.9.0.86=0
+  - libpng=1.6.39=h5eee18b_0
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libtasn1=4.19.0=h5eee18b_0
+  - libtiff=4.5.0=h6a678d5_2
+  - libunistring=0.9.10=h27cfd23_0
+  - libuuid=1.41.5=h5eee18b_0
+  - libwebp=1.2.4=h11a3e52_1
+  - libwebp-base=1.2.4=h5eee18b_1
+  - lz4-c=1.9.4=h6a678d5_0
+  - markupsafe=2.1.1=py310h7f8727e_0
+  - mkl=2023.1.0=h6d00ec8_46342
+  - mkl-service=2.4.0=py310h5eee18b_1
+  - mkl_fft=1.3.6=py310h1128e8f_1
+  - mkl_random=1.2.2=py310h1128e8f_1
+  - mpc=1.1.0=h10f8cd9_1
+  - mpfr=4.0.2=hb69a4c5_1
+  - ncurses=6.4=h6a678d5_0
+  - nettle=3.7.3=hbbd107a_1
+  - networkx=2.8.4=py310h06a4308_1
+  - numpy=1.25.0=py310h5f9d8c6_0
+  - numpy-base=1.25.0=py310hb5e798b_0
+  - openh264=2.1.1=h4ff587b_0
+  - openssl=3.0.12=h7f8727e_0
+  - pillow=9.4.0=py310h6a678d5_0
+  - pip=23.1.2=py310h06a4308_0
+  - pycparser=2.21=pyhd3eb1b0_0
+  - pyopenssl=23.0.0=py310h06a4308_0
+  - pysocks=1.7.1=py310h06a4308_0
+  - python=3.10.11=h955ad1f_3
+  - pytorch-cuda=11.8=h7e8668a_5
+  - pytorch-mutex=1.0=cuda
+  - readline=8.2=h5eee18b_0
+  - requests=2.29.0=py310h06a4308_0
+  - setuptools=67.8.0=py310h06a4308_0
+  - sqlite=3.41.2=h5eee18b_0
+  - sympy=1.11.1=py310h06a4308_0
+  - tbb=2021.8.0=hdb19cb5_0
+  - tk=8.6.12=h1ccaba5_0
+  - torchtriton=2.0.0=py310
+  - typing_extensions=4.6.3=py310h06a4308_0
+  - urllib3=1.26.16=py310h06a4308_0
+  - wheel=0.38.4=py310h06a4308_0
+  - xz=5.4.2=h5eee18b_0
+  - zlib=1.2.13=h5eee18b_0
+  - zstd=1.5.5=hc292b87_0
+  - pip:
+      - absl-py==1.4.0
+      - accelerate==0.20.3
+      - aiohttp==3.8.4
+      - aiosignal==1.3.1
+      - annotated-types==0.6.0
+      - asttokens==2.2.1
+      - astunparse==1.6.3
+      - async-timeout==4.0.2
+      - attrs==23.1.0
+      - backcall==0.2.0
+      - bert-score==0.3.13
+      - bitsandbytes==0.41.0
+      - bleurt==0.0.2
+      - brotli==1.1.0
+      - cachetools==5.3.1
+      - click==8.1.7
+      - cmake==3.25.0
+      - colorama==0.4.6
+      - contourpy==1.1.0
+      - cycler==0.11.0
+      - datasets==2.13.1
+      - decorator==5.1.1
+      - deepspeed==0.12.6
+      - dill==0.3.6
+      - dotmap==1.3.30
+      - emoji==2.10.0
+      - evaluate==0.4.1
+      - executing==1.2.0
+      - flatbuffers==24.3.25
+      - fonttools==4.40.0
+      - frozenlist==1.3.3
+      - fsspec==2023.6.0
+      - gast==0.5.4
+      - google-auth==2.21.0
+      - google-auth-oauthlib==1.0.0
+      - google-pasta==0.2.0
+      - grpcio==1.56.0
+      - h5py==3.10.0
+      - hjson==3.1.0
+      - huggingface-hub==0.15.1
+      - inflate64==1.0.0
+      - ipython==8.14.0
+      - jedi==0.19.0
+      - joblib==1.3.2
+      - jsonlines==3.1.0
+      - keras==3.1.1
+      - kiwisolver==1.4.4
+      - libclang==18.1.1
+      - lightning-utilities==0.9.0
+      - lit==15.0.7
+      - loralib==0.1.1
+      - lxml==4.9.2
+      - markdown==3.4.3
+      - markdown-it-py==3.0.0
+      - matplotlib==3.7.2
+      - matplotlib-inline==0.1.6
+      - mdurl==0.1.2
+      - ml-dtypes==0.3.2
+      - mpmath==1.2.1
+      - multidict==6.0.4
+      - multiprocess==0.70.14
+      - multivolumefile==0.2.3
+      - namex==0.0.7
+      - ninja==1.11.1.1
+      - nltk==3.8.1
+      - nvidia-cuda-runtime-cu11==11.7.99
+      - oauthlib==3.2.2
+      - openai==0.27.8
+      - opencv-python==4.9.0.80
+      - opt-einsum==3.3.0
+      - optree==0.11.0
+      - packaging==23.1
+      - pandas==2.0.2
+      - parso==0.8.3
+      - peft==0.3.0
+      - pexpect==4.8.0
+      - pickleshare==0.7.5
+      - portalocker==2.7.0
+      - prompt-toolkit==3.0.39
+      - protobuf==4.23.3
+      - psutil==5.9.5
+      - ptyprocess==0.7.0
+      - pure-eval==0.2.2
+      - py-cpuinfo==9.0.0
+      - py7zr==0.20.8
+      - pyarrow==12.0.1
+      - pyasn1==0.5.0
+      - pyasn1-modules==0.3.0
+      - pybcj==1.0.2
+      - pycryptodomex==3.19.0
+      - pydantic==2.5.3
+      - pydantic-core==2.14.6
+      - pydotmap==0.1.3
+      - pygments==2.16.1
+      - pynvml==11.5.0
+      - pyparsing==3.0.9
+      - pyppmd==1.1.0
+      - python-dateutil==2.8.2
+      - python-dotenv==1.0.0
+      - pytictoc==1.5.2
+      - pytorch-lightning==2.0.4
+      - pytz==2023.3
+      - pyyaml==6.0
+      - pyzstd==0.15.9
+      - regex==2023.6.3
+      - requests-oauthlib==1.3.1
+      - responses==0.18.0
+      - retrying==1.3.4
+      - rich==13.7.1
+      - rouge==1.0.1
+      - rouge-score==0.1.2
+      - rsa==4.9
+      - sacrebleu==2.3.1
+      - safetensors==0.3.1
+      - scikit-learn==1.3.0
+      - scipy==1.11.0
+      - sentencepiece==0.1.99
+      - six==1.16.0
+      - sklearn==0.0.post7
+      - stack-data==0.6.2
+      - tabulate==0.9.0
+      - tensorboard==2.16.2
+      - tensorboard-data-server==0.7.1
+      - tensorflow==2.16.1
+      - tensorflow-io-gcs-filesystem==0.36.0
+      - termcolor==2.4.0
+      - texttable==1.7.0
+      - tf-slim==1.1.0
+      - threadpoolctl==3.2.0
+      - timm==0.4.5
+      - tokenizers==0.13.3
+      - torch==2.0.1+cu118
+      - torchaudio==2.0.2+cu118
+      - torchmetrics==0.11.4
+      - torchvision==0.15.2+cu118
+      - tqdm==4.65.0
+      - traitlets==5.9.0
+      - transformers==4.30.2
+      - tzdata==2023.3
+      - wcwidth==0.2.6
+      - werkzeug==2.3.6
+      - wrapt==1.16.0
+      - xxhash==3.2.0
+      - yarl==1.9.2
+variables:
+  LD_LIBRARY_PATH: <CONDA_PATH>/envs/SPT/lib
+  LIBRARY_PATH: <CONDA_PATH>/envs/SPT/lib

evaluate_runs_results.py ADDED Viewed

	@@ -0,0 +1,150 @@

+import csv
+import glob
+import pickle
+import re
+from dotenv import load_dotenv
+load_dotenv()
+import numpy
+from bert_score import BERTScorer
+from evaluation import f1_score
+import evaluate
+rouge = evaluate.load('rouge')
+# bertscore = BERTScorer(lang='en', device='cuda')
+bertscore = BERTScorer(model_type='microsoft/deberta-xlarge-mnli', device='cuda')
+_main_path = 'public_ckpt'
+ADD_METEOR = True
+if ADD_METEOR:
+    meteor_scorer = evaluate.load('meteor')
+DO_PRED_CLEAN = True
+def evaluate_folder(main_path, skip_exists=True):
+    results_path = f'{main_path}/results.txt'
+    results_csv_path = f'{main_path}/results.csv'
+    paths = glob.glob(f'{main_path}/*/evaluation_result*.pkl')
+    all_results = []
+    csv_results = []
+    csv_results.append(['path',
+                        'ppl',
+                        'F1',
+                        'bleu',
+                        'bleu-1',
+                        'bleu-2',
+                        'bleu-3',
+                        'bleu-4',
+                        'rouge1',
+                        'rouge2',
+                        'rougel',
+                        'BERT f1',
+                        'BERT precision',
+                        'BERT recall',
+                        'dist-1',
+                        'dist-2',
+                        'meteor',
+                        'valid_num'])
+    for path in paths:
+        with open(path, 'rb') as file:
+            results = pickle.load(file)
+        if results.get('result_str') is not None and skip_exists:
+            all_results.append(results['result_str'])
+            csv_results.append(results['csv'])
+            continue
+        preds = results['pred_text']
+        clean_preds = []
+        if DO_PRED_CLEAN:
+            for pred in preds:
+                search_result = re.search('R:|Q:|Summary:|\n|\:', pred)
+                if search_result is not None:
+                    clean_preds.append(pred[:search_result.span()[0]])
+                else:
+                    clean_preds.append(pred)
+            preds = clean_preds
+        tgt = results['gt_text']
+        def bleu_score(prediction, ground_truths):
+            from sacrebleu import BLEU
+            bleu = BLEU()
+            score = bleu.corpus_score(prediction, ground_truths)
+            return score
+        bleu = bleu_score(preds, [tgt])
+        precision, recall, f1 = bertscore.score(preds, tgt, verbose=False, batch_size=64)
+        mean_precision = precision.mean().item()
+        mean_recall = recall.mean().item()
+        mean_f1 = f1.mean().item()
+        def eval_distinct(corpus):
+            unigrams = []
+            bigrams = []
+            for n, rep in enumerate(corpus):
+                rep = rep.strip()
+                temp = rep.split(' ')
+                unigrams += temp
+                for i in range(len(temp) - 1):
+                    bigrams.append(temp[i] + ' ' + temp[i + 1])
+            distink_1 = len(set(unigrams)) * 1.0 / len(unigrams)
+            distink_2 = len(set(bigrams)) * 1.0 / len(bigrams)
+            return distink_1, distink_2
+        rouge_results = rouge.compute(predictions=preds, references=tgt)
+        rouge1, rouge2, rougel = rouge_results['rouge1'], rouge_results['rouge2'], rouge_results['rougeL']
+        me_score = 0
+        if ADD_METEOR:
+            me_score = meteor_scorer.compute(predictions=preds, references=tgt)['meteor']
+        from evaluation import rouge_score
+        _rouge = rouge_score(preds, [tgt])
+        f1 = [f1_score(p, [t]) for p, t in zip(preds, tgt)]
+        f1 = numpy.asfarray(f1).mean()
+        ppl=''
+        result_str = f"""
+        path: {path}
+        F1: {f1}
+        bleu: {bleu.score}
+        bleu detail: {bleu.precisions}
+        rouge1, rouge2, rougel: {rouge1, rouge2, rougel}
+        BERT f1: {mean_f1}
+        BERT precision: {mean_precision}
+        BERT recall: {mean_recall}
+        dist: {eval_distinct(preds)}
+        METEOR: {me_score}
+        valid_num: {len(preds)}
+        """
+        csv_data = [path,
+                    f1 * 100.0,
+                    bleu.score,
+                    *bleu.precisions,
+                    rouge1 * 100.0,
+                    rouge2 * 100.0,
+                    rougel * 100.0,
+                    mean_f1 * 100.0,
+                    mean_precision * 100.0,
+                    mean_recall * 100.0,
+                    *eval_distinct(preds),
+                    me_score,
+                    len(preds)]
+        csv_results.append(csv_data)
+        print(result_str)
+        all_results.append(result_str)
+        with open(path, 'wb') as file:
+            results['result_str'] = result_str
+            results['csv'] = csv_data
+            pickle.dump(results, file)
+    with open(results_path, 'w') as file:
+        file.write("\n=====\n".join(all_results))
+    with open(results_csv_path, 'w') as file:
+        writer = csv.writer(file)
+        writer.writerows(csv_results)
+if __name__ == '__main__':
+    evaluate_folder(_main_path, skip_exists=False)

evaluation.py ADDED Viewed

	@@ -0,0 +1,92 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import logging
+import string
+from collections import Counter
+from typing import Callable
+import regex
+from rouge import Rouge
+rouge = Rouge()
+logger = logging.getLogger(__name__)
+# Normalization and score functions from SQuAD evaluation script https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/
+def normalize_answer(s: str) -> str:
+    def remove_articles(text):
+        return regex.sub(r"\b(a|an|the)\b", " ", text)
+    def white_space_fix(text):
+        return " ".join(text.split())
+    def remove_punc(text):
+        exclude = set(string.punctuation)
+        return "".join(ch for ch in text if ch not in exclude)
+    def lower(text):
+        return text.lower()
+    return white_space_fix(remove_articles(remove_punc(lower(s))))
+def em(prediction, ground_truth, normalize_fn):
+    return float(normalize_fn(prediction) == normalize_fn(ground_truth))
+def f1(prediction, ground_truth, normalize_fn):
+    prediction_tokens = normalize_fn(prediction).split()
+    ground_truth_tokens = normalize_fn(ground_truth).split()
+    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
+    num_same = sum(common.values())
+    if num_same == 0:
+        return 0
+    precision = 1.0 * num_same / len(prediction_tokens)
+    recall = 1.0 * num_same / len(ground_truth_tokens)
+    f1 = (2 * precision * recall) / (precision + recall)
+    return f1
+def rouge_wrapper(prediction, ground_truth):
+    try:
+        result = rouge.get_scores(prediction, ground_truth, avg=True)
+        return result["rouge-1"]["f"], result["rouge-2"]["f"], result["rouge-l"]["f"]
+    except:
+        return 0.0, 0.0, 0.0
+# pred = [p1, p2 ..., pn] gt=[[g1,g2,...,gn]]
+def f1_score(prediction, ground_truths, normalize_fn: Callable[[str], str] = lambda x: x):
+    return max([f1(prediction, gt, normalize_fn) for gt in ground_truths])
+def exact_match_score(prediction, ground_truths, normalize_fn: Callable[[str], str] = lambda x: x):
+    return max([em(prediction, gt, normalize_fn) for gt in ground_truths])
+# pred = [p1, p2 ..., pn] gt=[[g1,g2,...,gn]]
+def rouge_score(prediction, ground_truths):
+    ground_truths = [x for x in ground_truths if len(x) > 0]
+    if (
+            len(prediction) == 0 or len(ground_truths) == 0
+    ):  # check if empty prediction or if there is no hypothesis with len > 0
+        return 0.0, 0.0, 0.0
+    scores = [rouge_wrapper(prediction, gt) for gt in ground_truths]
+    rouge1 = max(s[0] for s in scores)
+    rouge2 = max(s[1] for s in scores)
+    rougel = max(s[2] for s in scores)
+    return rouge1, rouge2, rougel
+# pred = [p1, p2 ..., pn] gt=[[g1,g2,...,gn]]
+def bleu_score(prediction, ground_truths):
+    from sacrebleu import BLEU
+    bleu = BLEU()
+    score = bleu.corpus_score(prediction, ground_truths)
+    return score

interactive_test.py ADDED Viewed

	@@ -0,0 +1,205 @@

+import argparse
+import glob
+import json
+import locale
+import os
+import random
+import re
+import time
+from multiprocessing import freeze_support
+import deepspeed
+import torch
+from dotenv import load_dotenv
+from torch.utils.data import DistributedSampler
+from dataset.dataset import PersonaChatDataset
+from utils.dist_helper import setup
+from utils.format_inputs import TASK_TYPE
+from utils.parser_helper import str2bool
+os.environ["PYTHONIOENCODING"] = "utf-8"
+myLocale = locale.setlocale(category=locale.LC_ALL, locale="C.UTF-8")
+load_dotenv()
+argparse = argparse.ArgumentParser()
+argparse.add_argument('--model_path', type=str, default=None)
+argparse.add_argument('--path_pattern', type=str, default=None)
+argparse.add_argument('--batch_size', type=int)
+argparse.add_argument('--valid_path', type=str, default=None)
+argparse.add_argument('--local_rank', type=int, default=-1)
+argparse.add_argument('--skip_exists', type=str2bool, default=False)
+argparse.add_argument('--selection_noise', type=float, default=None)
+parser = deepspeed.add_config_arguments(argparse)
+args = argparse.parse_args()
+_cmd_args = parser.parse_args()
+freeze_support()
+VICUNA_PREFIX = 'PATH_TO_VICUNA'
+def test_process(model_paths, batch_size, valid_path, skip_exists, selection_noise, cmd_args):
+    world_size = int(os.getenv("WORLD_SIZE", "1"))
+    with open(cmd_args.deepspeed_config) as json_file:
+        ds_config = json.load(json_file)
+        del cmd_args.deepspeed_config
+    setup()
+    for model_path in model_paths:
+        try:
+            if selection_noise is not None:
+                save_dir = os.sep.join(
+                    model_path.split(os.sep)[:-1]) + os.sep + f'evaluation_result_selection_noise={selection_noise}.pkl'
+            else:
+                save_dir = os.sep.join(model_path.split(os.sep)[:-1]) + os.sep + 'evaluation_result.pkl'
+            if os.path.exists(save_dir) and (skip_exists):
+                continue
+            print(
+                f"Start setup rank {deepspeed.comm.get_local_rank()} of {world_size} on GPU {torch.cuda.current_device()}")
+            ckpt = torch.load(os.sep.join(model_path.split(os.sep)[:-1]) + os.sep + 'checkpoint_best.pth',
+                              map_location=f'cpu')
+            config = ckpt['config']
+            ds_config['train_micro_batch_size_per_gpu'] = batch_size
+            load_precision = '32'
+            if config.model.load_bit == 16:
+                ds_config['float16']['enabled'] = True
+                load_precision = 'fp16'
+            if 'llama' in config.model.model_name.lower():
+                ds_config['float16']['enabled'] = False
+                ds_config['bf16']['enabled'] = True
+                load_precision = 'bf16'
+            load_bit_map = {
+                'fp16': torch.float16,
+                'bf16': torch.bfloat16,
+                '32': torch.float32}
+            if config.model.model_type == 'selective_pt':
+                from models.selective_llm_chat import SelectLLMChat as LLMChat
+            else:
+                from models.llm_chat import LLMChat
+            if 'vicuna' in config.model.model_name and (not os.path.exists(config.model.model_name)):
+                config.model.model_name = VICUNA_PREFIX + os.sep + config.model.model_name.split(os.sep)[-1]
+            _model = LLMChat(config, batch_size)
+            left_tokenizer = _model.left_tokenizer
+            right_tokenizer = _model.right_tokenizer
+            print(f'LOADING {model_path} with {load_precision} precision')
+            model_engine, _, _, _ = deepspeed.initialize(args=cmd_args,
+                                                         model=_model,
+                                                         config=ds_config,
+                                                         )
+            model_engine.load_checkpoint(model_path, load_module_strict=False, load_optimizer_states=False,
+                                         load_lr_scheduler_states=False,
+                                         load_module_only=True)
+            valid_path_file = valid_path
+            if valid_path_file is None:
+                valid_path_file = config.dataset.valid
+            if config.dataset.test.__class__ is str:
+                valid_path_file = config.dataset.test
+                print('using train split from personachat')
+            task_type = TASK_TYPE(config.training.task_type)
+            valid_dataset = PersonaChatDataset(valid_path_file, max_context_turns=config.dataset.max_context_turns)
+            from dataset.dataset import get_dataloader
+            max_new_token = 32
+            valid_sampler = DistributedSampler(valid_dataset, num_replicas=world_size, shuffle=False,
+                                               drop_last=False)
+            valid_dataloader = get_dataloader(valid_dataset, batch_size, num_workers=0, sampler=valid_sampler)
+            context_input = []
+            persona_list = []
+            dist_pred_text = [None for _ in range(world_size)]
+            dist_gt_text = [None for _ in range(world_size)]
+            pred_text = []
+            gt_text = []
+            selected_prompts = []
+            print('Please enter your input:')
+            first_setence = input()
+            chosen_persona = random.choice([p['persona'] for p in valid_dataset.turns_data])
+            history = [f"Q: {first_setence}"]
+            history_with_prompt_idx = [f"USER: {first_setence}"]
+            selected_prompts = []
+            while True:
+                data = {'context_input': [history],
+                        'persona_list': [chosen_persona],
+                        'target': ['not use']}
+                _, text, batch_selected_prompts = LLMChat.test_step(model_engine, data, left_tokenizer,
+                                                                    right_tokenizer,
+                                                                    config, max_new_tokens=max_new_token,
+                                                                    tqdm_instance=None,
+                                                                    selection_noise=None,
+                                                                    no_repeat_ngram_size=4,
+                                                                    top_p=0.9,
+                                                                    num_beams=10)
+                response = text[0].strip()
+                search_result = re.search('R:|Q:|Summary:|\n|\:', response)
+                if search_result is not None:
+                    response = response[:search_result.span()[0]]
+                response = response.strip()
+                selected_prompts.append(batch_selected_prompts.item())
+                history += [f"R: {response}"]
+                history_with_prompt_idx += [f"SPT: {response} [SPT Index: {batch_selected_prompts.item()}]"]
+                history_str = "\n".join(history_with_prompt_idx)
+                print_str = f"""
+Persona: {' '.join(chosen_persona)}
+Dialogue:
+{history_str}
+                    """
+                print(print_str)
+                print('Please enter your input:')
+                user_input = input()
+                if user_input == 'r':
+                    history = history[:-1]
+                    history_with_prompt_idx = history_with_prompt_idx[:-1]
+                    continue
+                if user_input == 'exit':
+                    exit()
+                elif user_input == 'save':
+                    os.makedirs('interactive_dialog', exist_ok=True)
+                    with open('interactive_dialog/'+time.strftime('%Y-%m-%d-%H%M')+'.txt', 'w') as file:
+                        file.write(print_str)
+                    history = []
+                    history_with_prompt_idx = []
+                    chosen_persona = random.choice([p['persona'] for p in valid_dataset.turns_data])
+                    print('Please enter your input:')
+                    user_input = input()
+                elif user_input == 'clear':
+                    history = []
+                    history_with_prompt_idx = []
+                    chosen_persona = random.choice([p['persona'] for p in valid_dataset.turns_data])
+                    print('Please enter your input:')
+                    user_input = input()
+                history += [f"Q: {user_input}"]
+                history_with_prompt_idx += [f"USER: {user_input}"]
+        except Exception as e:
+            save_dir = os.sep.join(model_path.split(os.sep)[:-1]) + os.sep + "test_error.txt"
+            print(f'WRITING TESTING ERROR! ERROR: {str(e)}')
+            with open(save_dir, 'w') as file:
+                file.write(str(e))
+        deepspeed.comm.barrier()
+    deepspeed.comm.barrier()
+model_path_arg = args.model_path
+model_paths = [model_path_arg]
+if len(glob.glob(model_path_arg + os.sep + 'ds_ckpt' + os.sep + '*')):
+    model_paths = [model_path_arg + os.sep + 'ds_ckpt']
+elif not model_path_arg.endswith('.pth'):
+    import glob
+    path_pattern = args.path_pattern
+    if path_pattern is not None:
+        model_paths = glob.glob(f'{model_path_arg}/{path_pattern}/ds_ckpt/*/*.pt')
+    else:
+        model_paths = glob.glob(f'{model_path_arg}/*/ds_ckpt/*/*.pt')
+    model_paths = list(set([os.sep.join(p.split(os.sep)[:-2]) for p in model_paths]))
+    print(model_paths)
+num_of_gpus = torch.cuda.device_count()
+print(f"{num_of_gpus} GPUs available")
+test_process(model_paths, args.batch_size, args.valid_path,
+             args.skip_exists, args.selection_noise, cmd_args=_cmd_args)
+deepspeed.comm.barrier()
+deepspeed.comm.destroy_process_group()
+print('Test Ends')

models/__pycache__/llm_chat.cpython-310.pyc ADDED Viewed

Binary file (7.63 kB). View file

models/__pycache__/selective_llm_chat.cpython-310.pyc ADDED Viewed

Binary file (13 kB). View file

models/llm_chat.py ADDED Viewed

	@@ -0,0 +1,227 @@

+import torch
+from peft import get_peft_model, LoraConfig, PromptTuningConfig, TaskType, PrefixTuningConfig
+from torch import nn, autocast
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.deepspeed import HfDeepSpeedConfig
+from utils.format_inputs import TASK_TYPE
+from utils.format_inputs import format_causal_personachat_input, format_personachat_input, \
+    format_generate_persona_input
+from utils.model_helpers import print_trainable_parameters
+# TODO: we need to extract LORA Weight and Bias from the model
+# TODO: we need to do adaptive applied LORA
+class LLMChat(nn.Module):
+    def __init__(self, config, batch_size, ds_config=None):
+        if ds_config is not None:
+            _hfdsc = HfDeepSpeedConfig(ds_config)
+        super(LLMChat, self).__init__()
+        self.model_name = config.model.model_name
+        self.load_bit = config.model.load_bit
+        self.left_tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
+        original_vocab_size = len(self.left_tokenizer)
+        if config.training.mode != 'causal':
+            self.left_tokenizer.add_special_tokens({'pad_token': '[PAD]',
+                                                    'bos_token': '[BOS]',
+                                                    'eos_token': '[EOS]',
+                                                    'unk_token': '[UNK]',
+                                                    'sep_token': '[SEP]',
+                                                    'cls_token': '[CLS]',
+                                                    'mask_token': '[MASK]'})
+        self.left_tokenizer.padding_side = 'left'
+        self.left_tokenizer.truncation_side = 'left'
+        self.right_tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
+        if config.training.mode != 'causal':
+            self.right_tokenizer.add_special_tokens({'pad_token': '[PAD]',
+                                                     'bos_token': '[BOS]',
+                                                     'eos_token': '[EOS]',
+                                                     'unk_token': '[UNK]',
+                                                     'sep_token': '[SEP]',
+                                                     'cls_token': '[CLS]',
+                                                     'mask_token': '[MASK]'})
+        self.right_tokenizer.padding_side = 'right'
+        self.right_tokenizer.truncation_side = 'right'
+        if self.left_tokenizer.pad_token is None and config.model.pad_token == 'bos':
+            self.left_tokenizer.pad_token = self.left_tokenizer.bos_token
+            self.right_tokenizer.pad_token = self.right_tokenizer.bos_token
+        elif self.left_tokenizer.pad_token_id is None:
+            self.left_tokenizer.pad_token = self.left_tokenizer.eos_token
+            self.right_tokenizer.pad_token = self.right_tokenizer.eos_token
+        self.batch_size = batch_size
+        load_bit_map = {4: {'load_in_4bit': True,
+                            'bnb_4bit_compute_dtype': torch.bfloat16},
+                        8: {'load_in_8bit': True},
+                        16: {'torch_dtype': torch.float16},
+                        32: {'torch_dtype': torch.float32}}
+        assert config.model.load_bit in [16, 32], 'deepspeed is not friendly with bnb!'
+        model = AutoModelForCausalLM.from_pretrained(
+            config.model.model_name,
+            **load_bit_map[config.model.load_bit],
+        )
+        if config.training.mode != 'causal':
+            model.resize_token_embeddings(len(self.left_tokenizer))
+        # for m in model.children():
+        #     if hasattr(m, 'gradient_checkpointing_enable'):
+        #         m.gradient_checkpointing_enable()
+        model.gradient_checkpointing_enable()
+        if config.model.peft_config is not None:
+            for param in model.parameters():
+                param.requires_grad = False  # freeze the model - train adapters later
+                if param.ndim == 1:
+                    # cast the small parameters (e.g. layernorm) to fp32 for stability
+                    param.data = param.data.to(torch.float32)
+            model.enable_input_require_grads()
+            # # enable special token embedding params, since we resized the vocabulary
+            # for name, param in model.named_parameters():
+            #     if 'embed_tokens' in name:
+            #         param[original_vocab_size:].requires_grad = True
+            class CastOutputToFloat(nn.Sequential):
+                def forward(self, x): return super().forward(x).to(torch.float32)
+            if config.model.peft_type == 'prompt_tuning':
+                peft_config = PromptTuningConfig(
+                    **config.model.peft_config,
+                    task_type=TaskType.CAUSAL_LM,
+                )
+            elif config.model.peft_type == 'prefix_tuning':
+                peft_config = PrefixTuningConfig(
+                    **config.model.peft_config,
+                    task_type=TaskType.CAUSAL_LM,
+                )
+            else:
+                peft_config = LoraConfig(**config.model.peft_config)
+            model.lm_head = CastOutputToFloat(model.lm_head)
+            model = get_peft_model(model, peft_config)
+        self.using_nn_modulelist = False
+        if config.model.using_nn_modulelist.__class__ is bool and config.model.using_nn_modulelist:
+            self.using_nn_modulelist = config.model.using_nn_modulelist
+            self.model = nn.ModuleList([model])
+        else:
+            self.model = model
+        if config.model.add_extra_layers.__class__ is bool and config.model.add_extra_layers:
+            self.prompt_normalizer = nn.Linear(
+                self.model[0].prompt_encoder.default.embedding.weight.shape[1],
+                self.model[0].word_embeddings.weight.shape[1])
+            self.score_activation = nn.Softplus(threshold=1, beta=10)
+        self.learning_rate = config.training.learning_rate
+        self.warmup_steps = config.training.warmup_steps
+        self.config = config
+        self.find_batch = False
+        print_trainable_parameters(self)
+    def print_llm_trainable_parameters(self):
+        print_trainable_parameters(self.model)
+    @autocast('cuda')
+    def forward(self, x):
+        if self.config._non_exists == 1:
+            self.prompt_normalizer(x)
+            self.score_activation(x)
+        for k in x.keys():
+            x[k] = x[k].cuda()
+        if self.find_batch:
+            x['attention_mask'] = x['attention_mask'].new_ones(x['attention_mask'].shape)
+        if self.using_nn_modulelist:
+            if self.config.model.using_output_stack.__class__ is bool and self.config.model.using_output_stack:
+                _outputs = [_model(**x) for _model in self.model]
+                _logits = torch.stack([_output['logits'] for _output in _outputs])
+                return {'logits': _logits}
+            return self.model[0](**x)
+        return self.model(**x)
+    def on_train_start(self) -> None:
+        self.print_llm_trainable_parameters()
+    @staticmethod
+    def training_step(model, batch, left_tokenizer, right_tokenizer, config, find_batch=False, mode='normal',
+                      task_type=TASK_TYPE.GENERATE_RESPONSE, **_kwargs):
+        assert mode in ['normal', 'causal']
+        if task_type == TASK_TYPE.GENERATE_PERSONA and mode == 'normal':
+            lm_input, lm_target = format_generate_persona_input(batch, left_tokenizer, right_tokenizer,
+                                                                config)
+        elif task_type == TASK_TYPE.GENERATE_RESPONSE and mode == 'causal':
+            lm_input, lm_target = format_causal_personachat_input(batch, left_tokenizer, right_tokenizer,
+                                                                  config)
+        elif task_type == TASK_TYPE.GENERATE_RESPONSE and mode == 'normal':
+            lm_input, lm_target = format_personachat_input(batch, left_tokenizer, right_tokenizer, config)
+        else:
+            raise NotImplementedError('mode and task_type not implemented')
+        output = model(lm_input)
+        if find_batch:
+            loss = nn.CrossEntropyLoss()(output['logits'].view(-1, output['logits'].shape[-1]),
+                                         lm_target.cuda().view(-1))
+        else:
+            if config.model.peft_type == 'prompt_tuning':
+                virtual_tokens = config.model.peft_config.num_virtual_tokens
+                batch_size = lm_target.size()[0]
+                _lm_target = torch.cat(
+                    (lm_target.new_ones((batch_size, virtual_tokens)) * left_tokenizer.pad_token_id, lm_target), dim=1)
+            else:
+                _lm_target = lm_target
+            loss = nn.CrossEntropyLoss(ignore_index=left_tokenizer.pad_token_id)(
+                output['logits'].view(-1, output['logits'].shape[-1]),
+                _lm_target.cuda().view(-1))
+        # self.log('train_loss', loss, on_step=True, on_epoch=False, prog_bar=True, logger=True)
+        if config.training.normalize_loss.__class__ == bool and config.training.normalize_loss.__class__:
+            model.module.normalize()
+        return loss
+    def normalize(self):
+        raise NotImplementedError('normalize trainable weights needs implementation')
+        return None
+    @staticmethod
+    def validation_step(model, batch, left_tokenizer, right_tokenizer, config, task_type, mode='normal'):
+        loss = LLMChat.training_step(model, batch, left_tokenizer, right_tokenizer, config, task_type=task_type,
+                                     find_batch=False, mode=mode)
+        return loss
+    def on_test_start(self) -> None:
+        from peft import get_peft_model_state_dict, set_peft_model_state_dict
+        peft_weight = get_peft_model_state_dict(self.model).copy()
+        peft_config = self.model.peft_config
+        del self.model
+        model = AutoModelForCausalLM.from_pretrained(
+            self.config.model.model_name,
+            torch_dtype=torch.bfloat16, low_cpu_mem_usage=True,
+        )
+        self.model = get_peft_model(model, peft_config['default'])
+        set_peft_model_state_dict(self.model, peft_weight, adapter_name='default')
+        self.model.merge_and_unload()
+        self.model.eval()
+    @staticmethod
+    @autocast('cuda')
+    def test_step(model, batch, left_tokenizer, right_tokenizer, config, max_new_tokens=16, tqdm_instance=None, **kwargs):
+        model.eval()
+        task_type = TASK_TYPE(config.training.task_type)
+        with torch.no_grad():
+            if config.training.mode == 'causal':
+                lm_input, lm_target, inference_tokenized = format_causal_personachat_input(batch,
+                                                                                           left_tokenizer,
+                                                                                           right_tokenizer,
+                                                                                           config,
+                                                                                           for_test=True)
+            else:
+                lm_input, lm_target, inference_tokenized = format_personachat_input(batch, left_tokenizer,
+                                                                                    right_tokenizer, config,
+                                                                                    for_test=True)
+            inference_tokenized.to('cuda')
+            model_for_generation = None
+            if 'deepspeed' in str(model.__class__):
+                model_for_generation = model.module.model
+            else:
+                model_for_generation = model.model
+            if model_for_generation.__class__ is nn.ModuleList:
+                model_for_generation = model_for_generation[0]
+            # adding do_sample=False to avoid inf error!
+            raw_output = model_for_generation.generate(**inference_tokenized, max_new_tokens=max_new_tokens,
+                                                       do_sample=False)
+            trunc_output = raw_output[:, inference_tokenized['input_ids'].shape[1]:]
+            if trunc_output[trunc_output >= len(left_tokenizer)].size()[0] > 0:
+                trunc_output[trunc_output >= len(left_tokenizer)] = left_tokenizer.pad_token_id
+            text_output = right_tokenizer.batch_decode(trunc_output, skip_special_tokens=True)
+            return trunc_output, text_output, []

models/selective_llm_chat.py ADDED Viewed

	@@ -0,0 +1,390 @@

+import deepspeed
+import deepspeed
+import torch
+import transformers
+from peft import get_peft_model, PromptTuningConfig, TaskType, PrefixTuningConfig
+from torch import nn, autocast
+from torch.functional import F
+from tqdm import tqdm
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.deepspeed import HfDeepSpeedConfig
+from utils.format_inputs import TASK_TYPE
+from utils.format_inputs import format_causal_personachat_input, format_personachat_input, format_generate_persona_input
+from utils.model_helpers import print_trainable_parameters
+class SelectLLMChat(nn.Module):
+    def __init__(self, config, batch_size, ds_config=None):
+        super(SelectLLMChat, self).__init__()
+        if ds_config is not None:
+            _hfdsc = HfDeepSpeedConfig(ds_config)
+        peft_type = config.model.peft_type
+        self.peft_type = peft_type
+        assert config.model.peft_type in ['prompt_tuning', 'prefix_tuning',
+                                          ], 'only prompt tuning is supported!'
+        K = config.model.K
+        self.K = K
+        self.ensemble_training = config.training.ensemble
+        self.model_name = config.model.model_name
+        self.load_bit = config.model.load_bit
+        self.left_tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
+        if config.training.mode != 'causal':
+            self.left_tokenizer.add_special_tokens({'pad_token': '[PAD]',
+                                                    'bos_token': '[BOS]',
+                                                    'eos_token': '[EOS]',
+                                                    'unk_token': '[UNK]',
+                                                    'sep_token': '[SEP]',
+                                                    'cls_token': '[CLS]',
+                                                    'mask_token': '[MASK]'})
+        self.left_tokenizer.padding_side = 'left'
+        self.left_tokenizer.truncation_side = 'left'
+        self.right_tokenizer = AutoTokenizer.from_pretrained(self.model_name, use_fast=False)
+        if config.training.mode != 'causal':
+            self.right_tokenizer.add_special_tokens({'pad_token': '[PAD]',
+                                                     'bos_token': '[BOS]',
+                                                     'eos_token': '[EOS]',
+                                                     'unk_token': '[UNK]',
+                                                     'sep_token': '[SEP]',
+                                                     'cls_token': '[CLS]',
+                                                     'mask_token': '[MASK]'})
+        self.right_tokenizer.padding_side = 'right'
+        self.right_tokenizer.truncation_side = 'right'
+        if self.left_tokenizer.pad_token is None and config.model.pad_token=='bos':
+            self.left_tokenizer.pad_token = self.left_tokenizer.bos_token
+            self.right_tokenizer.pad_token = self.right_tokenizer.bos_token
+        elif self.left_tokenizer.pad_token_id is None:
+            self.left_tokenizer.pad_token = self.left_tokenizer.eos_token
+            self.right_tokenizer.pad_token = self.right_tokenizer.eos_token
+        self.batch_size = batch_size
+        load_bit_map = {4: {'load_in_4bit': True,
+                            'bnb_4bit_compute_dtype': torch.bfloat16},
+                        8: {'load_in_8bit': True},
+                        16: {'torch_dtype': torch.float16},
+                        'bf16': {'torch_dtype': torch.bfloat16},
+                        32: {'torch_dtype': torch.float32}}
+        assert config.model.load_bit in [16, 32, 'bf16'], 'deepspeed is not friendly with bnb!'
+        model = AutoModelForCausalLM.from_pretrained(
+                config.model.model_name,
+                **load_bit_map[config.model.load_bit]
+            )
+        if config.training.mode != 'causal':
+            model.resize_token_embeddings(len(self.left_tokenizer))
+        model.gradient_checkpointing_enable()
+        if config.model.peft_config is not None:
+            for param in model.parameters():
+                param.requires_grad = False  # freeze the model - train adapters later
+                if param.ndim == 1:
+                    # cast the small parameters (e.g. layernorm) to fp32 for stability
+                    param.data = param.data.to(torch.float32)
+            model.enable_input_require_grads()
+        class CastOutputToFloat(nn.Sequential):
+            def forward(self, x): return super().forward(x).to(torch.float32)
+        model.lm_head = CastOutputToFloat(model.lm_head)
+        self.model = model
+        models = []
+        peft_config = None
+        for _ in range(K):
+            if config.model.peft_type == 'prompt_tuning':
+                peft_config = PromptTuningConfig(
+                    **config.model.peft_config,
+                    task_type=TaskType.CAUSAL_LM,
+                )
+            elif config.model.peft_type == 'prefix_tuning':
+                peft_config = PrefixTuningConfig(
+                    **config.model.peft_config,
+                    task_type=TaskType.CAUSAL_LM,
+                )
+            else:
+                raise NotImplementedError()
+            _peft_model = get_peft_model(model, peft_config)
+            models.append(_peft_model)
+        self.models = nn.ModuleList(models)
+        self.learning_rate = config.training.learning_rate
+        self.warmup_steps = config.training.warmup_steps
+        self.config = config
+        self.find_batch = False
+        self.retriever = None
+        if config.model.retriever.retriever_type == 'transformer_encoder':
+            encoder_layer = nn.TransformerEncoderLayer(d_model=self.models[0].word_embeddings.weight.shape[1],
+                                                       nhead=config.model.retriever.n_head)
+            transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=config.model.retriever.num_layers)
+            self.retriever = transformer_encoder
+        if config.model.peft_type in ['prompt_tuning'] and config.model.normalizer.__class__ is not str:
+            class DoNothing(nn.Sequential):
+                def forward(self, x): return x
+            self.prompt_normalizer = DoNothing()
+        elif config.model.normalizer == 'linear':
+            if config.model.peft_type in ['prompt_tuning', 'prefix_tuning']:
+                _d_peft = self.models[0].prompt_encoder.default.embedding.weight.shape[1]
+            else:
+                raise NotImplementedError('check here!')
+            self.prompt_normalizer = nn.Linear(_d_peft, _d_peft)
+        if config.model.score_activation == 'softplus':
+            self.score_activation = nn.Softplus(threshold=1, beta=10)
+        elif config.model.score_activation == 'relu':
+            self.score_activation = nn.ReLU()
+        elif config.model.score_activation == 'leaky_relu':
+            self.score_activation = nn.LeakyReLU()
+        else:
+            self.score_activation = nn.Softplus(threshold=1, beta=10)
+            # raise NotImplementedError()
+        self.retriever_on = ['extra']
+        if config.model.retriever.retriever_on.__class__ is list:
+            self.retriever_on = config.model.retriever.retriever_on
+        if config.training.all_tunable.__class__ is bool and config.training.all_tunable:
+            for param in self.parameters():
+                param.requires_grad = True
+        print_trainable_parameters(self)
+        self.contrastive_metric = None
+        if config.training.contrastive_metric.__class__ is str:
+            self.contrastive_metric = config.training.contrastive_metric
+        self.contrastive_threshold = 0.0
+        if config.training.contrastive_threshold.__class__ is float:
+            self.contrastive_threshold = config.training.contrastive_threshold
+        self.config = config
+        self.annealing_nll = False
+        self.annealing_scalar = 0.0
+        if self.config.training.annealing_nll.__class__ == bool:
+            self.annealing_nll = self.config.training.annealing_nll
+            self.annealing_scalar = self.config.training.annealing_scalar
+    def print_llm_trainable_parameters(self):
+        print_trainable_parameters(self.model)
+    def retrieve_based_on_input_x(self, x, K):
+        return self.retrieve_prompts(x, K)
+    @autocast('cuda')
+    def retrieve_prompts(self, x, K):
+        batch_size = x['input_ids'].shape[0]
+        input_ids = x['input_ids']
+        spawned_x = input_ids.repeat(K, 1)
+        if self.models[0].base_model.__class__ == transformers.models.llama.modeling_llama.LlamaForCausalLM:
+            spawned_x_emb = self.models[0].base_model.model.embed_tokens(spawned_x)
+        else:
+            spawned_x_emb = self.models[0].base_model.model.decoder.embed_tokens(spawned_x)
+        if spawned_x_emb.shape[-1] != self.models[0].config.hidden_size:
+            # need project_in here
+            spawned_x_emb = self.models[0].base_model.model.decoder.project_in(spawned_x_emb)
+        prompt_embeddings = torch.stack([_model.prompt_encoder.default.embedding.weight for _model in self.models])
+        if self.retriever is not None:
+            if 'extra' in self.retriever_on:
+                prompt_embeddings = self.retriever(self.prompt_normalizer(prompt_embeddings))
+            if 'lm' in self.retriever_on:
+                spawned_x_emb = self.retriever(spawned_x_emb)
+        spawned_x_emb_mean = spawned_x_emb.mean(dim=1)
+        prompt_embeddings_mean = prompt_embeddings.mean(dim=1)
+        if self.retriever is None:
+            normalizer_on = self.config.model.normalizer_on
+            if normalizer_on.__class__ is not list:
+                prompt_embeddings_mean = self.prompt_normalizer(prompt_embeddings_mean)
+            if 'prompt' in normalizer_on:
+                prompt_embeddings_mean = self.prompt_normalizer(prompt_embeddings_mean)
+            if 'lm' in normalizer_on:
+                spawned_x_emb_mean = self.prompt_normalizer(spawned_x_emb_mean)
+        prompt_embeddings_mean_spawn = torch.repeat_interleave(prompt_embeddings_mean, batch_size, dim=0)
+        sim_scores = self.score_activation(
+            torch.nn.CosineSimilarity()(prompt_embeddings_mean_spawn, spawned_x_emb_mean))
+        return sim_scores
+    @autocast('cuda')
+    def forward(self, x, mode='training'):
+        for k in x.keys():
+            x[k] = x[k].cuda(device=deepspeed.comm.get_local_rank())
+        if self.find_batch:
+            x['attention_mask'] = x['attention_mask'].new_ones(x['attention_mask'].shape)
+        if mode == 'training':
+            if self.config.training.skip_retrieval.__class__ is bool and self.config.training.skip_retrieval:
+                sim_scores = None
+            else:
+                sim_scores = self.retrieve_based_on_input_x(x, self.K)
+            # get pt embeddings
+            _outputs = [_model(**x) for _model in self.models]
+            _logits = torch.stack([_output['logits'] for _output in _outputs])
+            return {'logits': _logits, 'sim_scores': sim_scores}
+        else:
+            raise NotImplementedError('validation and testing not implemented')
+    def on_train_start(self) -> None:
+        self.print_llm_trainable_parameters()
+        deepspeed.zero.Init()
+    @staticmethod
+    def training_step(model, batch, left_tokenizer, right_tokenizer, config, mode='normal',
+                      task_type=TASK_TYPE.GENERATE_RESPONSE, training_process=0.0):
+        assert mode in ['normal', 'causal']
+        if task_type == TASK_TYPE.GENERATE_PERSONA and mode == 'normal':
+            lm_input, lm_target = format_generate_persona_input(batch, left_tokenizer, right_tokenizer,
+                                                                config)
+        elif task_type == TASK_TYPE.GENERATE_RESPONSE and mode == 'causal':
+            lm_input, lm_target = format_causal_personachat_input(batch, left_tokenizer, right_tokenizer,
+                                                                  config)
+        elif task_type == TASK_TYPE.GENERATE_RESPONSE and mode == 'normal':
+            lm_input, lm_target = format_personachat_input(batch, left_tokenizer, right_tokenizer, config)
+        else:
+            raise NotImplementedError('mode and task_type not implemented')
+        output = model.module(dict(lm_input))
+        # suppose batch=2, K=3, the logits is presented interleave:
+        # [0,1]
+        # [0,1]
+        # [0,1]
+        logits = output['logits']  # (K*Batch,SeqLen,VocabSize)
+        logits = logits.view(-1, logits.shape[2], logits.shape[3])
+        sim_scores = output['sim_scores']
+        batch_size = lm_target.size()[0]
+        if config.model.peft_type == 'prompt_tuning':
+            virtual_tokens = config.model.peft_config.num_virtual_tokens
+            _lm_target = torch.cat(
+                (lm_target.new_ones((batch_size, virtual_tokens)) * left_tokenizer.pad_token_id, lm_target), dim=1)
+        else:
+            _lm_target = lm_target
+        _lm_target_spawn = _lm_target.repeat(config.model.K, 1)
+        losses = nn.CrossEntropyLoss(ignore_index=left_tokenizer.pad_token_id, reduction='none')(
+            logits.view(-1, logits.shape[-1]),
+            _lm_target_spawn.cuda(device=deepspeed.comm.get_local_rank()).view(-1))
+        if config.training.only_nll.__class__ == bool and config.training.only_nll:
+            return losses[losses != 0].mean()
+        reshaped_losses = losses.view(logits.shape[0], logits.shape[1]).detach().clone()
+        reshaped_losses = torch.stack([_losses[_losses != 0].mean() for _losses in reshaped_losses.detach().clone()])
+        # reshaped_losses = reshaped_losses.clone().detach().mean(dim=1)
+        softmaxed_neg_losses = nn.Softmax(dim=0)(
+            -reshaped_losses.view(config.model.K, batch_size) / config.training.tau_gold).permute(1, 0)
+        if config.training.adding_noise.__class__ is float:
+            noise = torch.randn_like(softmaxed_neg_losses, device=softmaxed_neg_losses.device)
+            softmaxed_neg_losses = softmaxed_neg_losses + config.training.adding_noise * noise
+        logsoftmaxed_sim_scores = F.log_softmax(sim_scores.view(config.model.K, batch_size) / config.training.tau_sim,
+                                                dim=0).permute(1, 0)
+        kldiv_loss = nn.KLDivLoss(reduction='batchmean')(logsoftmaxed_sim_scores,
+                                                         softmaxed_neg_losses)
+        selective_loss_weight = 1.0
+        if config.training.annealing_nll.__class__ is bool and config.training.annealing_nll:
+            _ann_scalar = config.training.annealing_scalar * (1 - training_process)
+            _sim_score = torch.clamp(_ann_scalar * nn.Softmax(-1)(sim_scores),
+                                     config.training.annealing_min, config.training.annealing_max).detach()
+            losses = torch.einsum('ab,a->ab', losses.view(logits.shape[0], logits.shape[1]), _sim_score).view(-1)
+        if config.training.selective_loss_weight.__class__ == float:
+            selective_loss_weight = config.training.selective_loss_weight
+        if config.training.selective_loss.__class__ == bool and (config.training.selective_loss == False):
+            loss = losses[losses != 0].mean()
+        elif config.training.disable_nll.__class__ is bool and config.training.disable_nll:
+            loss = selective_loss_weight * kldiv_loss
+        else:
+            loss = losses[losses != 0].mean() + selective_loss_weight * kldiv_loss
+        if model.module.ensemble_training:
+            K = config.model.K
+            enb_losses = []
+            for data_idx in range(batch_size):
+                data_indices = [data_idx + (batch_size * inc) for inc in range(K)]
+                ensemble_preds = logits[data_indices, :, :]
+                ensemble_sims = sim_scores[data_indices]
+                normed_preds = ensemble_sims.unsqueeze(-1).unsqueeze(-1).mul(ensemble_preds)
+                normed_preds = normed_preds.sum(dim=0)
+                _target = _lm_target_spawn[data_indices, :]
+                assert _target.unique(dim=0).shape[0] == 1, 'error in resemble the preds'
+                enb_loss = nn.CrossEntropyLoss(ignore_index=left_tokenizer.pad_token_id)(normed_preds,
+                                                                                         _target[0].cuda(
+                                                                                             device=deepspeed.comm.get_local_rank()))
+                enb_losses.append(enb_loss)
+            loss += torch.stack(enb_losses).mean()
+        if model.module.contrastive_metric:
+            ctr_losses = []
+            from sacrebleu import BLEU
+            ctr_metrics = BLEU(effective_order=True)
+            batch_persona = [' '.join(row) for row in batch['persona_list']]
+            statics = {}
+            # Dim here
+            #     x1 x2
+            # p1 s11 s21
+            # p2 s12 s22
+            # p3 s13 s23
+            permuted_sim_scores = sim_scores.unsqueeze(0).view(model.module.K, batch_size)
+            if model.module.contrastive_metric == 'bleu':
+                for idx in range(len(batch_persona) - 1):
+                    for jdx in range(idx + 1, len(batch_persona)):
+                        iele = batch_persona[idx]
+                        jele = batch_persona[jdx]
+                        scores = ctr_metrics.sentence_score(iele, [jele]).score
+                        idist = permuted_sim_scores[:, idx]
+                        jdist = permuted_sim_scores[:, jdx]
+                        cosine_emb_loss = nn.CosineEmbeddingLoss()
+                        if scores > model.module.contrastive_threshold:
+                            cosine_target = 1
+                        else:
+                            cosine_target = -1
+                        cos_loss = cosine_emb_loss(idist, jdist, torch.tensor(cosine_target))
+                        ctr_losses.append(cos_loss)
+                        statics[(idx, jdx)] = {'iele': iele, 'jele': jele, 'scores': scores,
+                                               'idist': idist,
+                                               'jdist': jdist, 'cos_emb_loss': cos_loss}
+            if len(ctr_losses) != 0:
+                ctr_losses_pt = torch.stack(ctr_losses).mean()
+                loss += config.training.contrastive_weight * ctr_losses_pt
+            else:
+                print(f'CTR ERROR: {statics}')
+        return loss
+    @staticmethod
+    def validation_step(model, batch, left_tokenizer, right_tokenizer, config, task_type, mode='normal'):
+        loss = SelectLLMChat.training_step(model, batch, left_tokenizer, right_tokenizer, config, task_type=task_type,
+                                           mode=mode, training_process=0.0)
+        return loss
+    @staticmethod
+    @autocast('cuda')
+    def test_step(model, batch, left_tokenizer, right_tokenizer, config, max_new_tokens=16, tqdm_instance: tqdm = None,
+                  selection_noise=None, **gen_kwargs):
+        model.eval()
+        with torch.no_grad():
+            if config.training.mode == 'causal':
+                lm_input, lm_target, inference_tokenized = format_causal_personachat_input(batch,
+                                                                                           left_tokenizer,
+                                                                                           right_tokenizer,
+                                                                                           config,
+                                                                                           for_test=True)
+            else:
+                lm_input, lm_target, inference_tokenized = format_personachat_input(batch, left_tokenizer,
+                                                                                    right_tokenizer,
+                                                                                    config,
+                                                                                    for_test=True)
+            inference_tokenized.to('cuda')
+            if 'deepspeed' in str(model.__class__):
+                batch_size = inference_tokenized['input_ids'].shape[0]
+                sim_scores = model.module.retrieve_based_on_input_x(inference_tokenized, config.model.K)
+                sim_scores = sim_scores.reshape(config.model.K, batch_size).permute(1, 0)
+                if selection_noise:
+                    noise = torch.randn_like(sim_scores, device=sim_scores.device)
+                    sim_scores = sim_scores + selection_noise * noise
+                selected_prompts = torch.argmax(sim_scores, dim=1)
+                if tqdm_instance is not None:
+                    tqdm_instance.set_postfix_str(f"selected prompts: {selected_prompts}")
+                detached_selected_prompts = selected_prompts.detach().cpu().numpy()
+                selected_prompts_set = set(detached_selected_prompts)
+                output_dicts = {}
+                # adding do_sample=False to avoid inf error!
+                for key in selected_prompts_set:
+                    outputs = model.module.models[key].generate(
+                        input_ids=inference_tokenized['input_ids'],
+                        attention_mask=inference_tokenized['attention_mask'],
+                        max_new_tokens=max_new_tokens,
+                        do_sample=False,
+                        **gen_kwargs
+                    )
+                    output_dicts[key] = outputs.detach().cpu()
+                raw_output = []
+                for idx, prompt_idx in enumerate(detached_selected_prompts):
+                    raw_output.append(output_dicts[prompt_idx][idx][inference_tokenized['input_ids'].shape[1]:])
+                # raw_output = torch.stack(raw_output).squeeze(1)
+                trunc_output = raw_output
+                text_output = right_tokenizer.batch_decode(trunc_output, skip_special_tokens=True)
+                return trunc_output, text_output, selected_prompts
+            else:
+                raise NotImplementedError('not implemented')

test.py ADDED Viewed

	@@ -0,0 +1,204 @@

+import argparse
+import glob
+import json
+import locale
+import os
+import re
+from functools import reduce
+from multiprocessing import freeze_support
+import deepspeed
+import torch
+import torch.distributed as dist
+from dotenv import load_dotenv
+from torch.utils.data import DistributedSampler
+from tqdm import tqdm
+from dataset.dataset import PersonaChatDataset
+from utils.dist_helper import setup
+from utils.format_inputs import TASK_TYPE
+from utils.parser_helper import str2bool
+os.environ["PYTHONIOENCODING"] = "utf-8"
+myLocale = locale.setlocale(category=locale.LC_ALL, locale="C.UTF-8")
+load_dotenv()
+argparse = argparse.ArgumentParser()
+argparse.add_argument('--model_path', type=str, default=None)
+argparse.add_argument('--path_pattern', type=str, default=None)
+argparse.add_argument('--batch_size', type=int)
+argparse.add_argument('--valid_path', type=str, default=None)
+argparse.add_argument('--local_rank', type=int, default=-1)
+argparse.add_argument('--skip_exists', type=str2bool, default=False)
+argparse.add_argument('--selection_noise', type=float, default=None)
+parser = deepspeed.add_config_arguments(argparse)
+args = argparse.parse_args()
+_cmd_args = parser.parse_args()
+freeze_support()
+VICUNA_PREFIX = 'PATH_TO_VICUNA'
+def test_process(model_paths, batch_size, valid_path, skip_exists, selection_noise, cmd_args):
+    world_size = int(os.getenv("WORLD_SIZE", "1"))
+    with open(cmd_args.deepspeed_config) as json_file:
+        ds_config = json.load(json_file)
+        del cmd_args.deepspeed_config
+    setup()
+    for model_path in model_paths:
+        try:
+            if selection_noise is not None:
+                save_dir = os.sep.join(model_path.split(os.sep)[:-1]) + os.sep + f'evaluation_result_selection_noise={selection_noise}.pkl'
+            else:
+                save_dir = os.sep.join(model_path.split(os.sep)[:-1]) + os.sep + 'evaluation_result.pkl'
+            if os.path.exists(save_dir) and (skip_exists):
+                continue
+            print(
+                f"Start setup rank {deepspeed.comm.get_local_rank()} of {world_size} on GPU {torch.cuda.current_device()}")
+            ckpt = torch.load(os.sep.join(model_path.split(os.sep)[:-1]) + os.sep + 'checkpoint_best.pth',
+                              map_location=f'cpu')
+            config = ckpt['config']
+            ds_config['train_micro_batch_size_per_gpu'] = batch_size
+            load_precision = '32'
+            if config.model.load_bit == 16:
+                ds_config['float16']['enabled'] = True
+                load_precision = 'fp16'
+            if 'llama' in config.model.model_name.lower():
+                ds_config['float16']['enabled'] = False
+                ds_config['bf16']['enabled'] = True
+                load_precision = 'bf16'
+            load_bit_map = {
+                'fp16': torch.float16,
+                'bf16': torch.bfloat16,
+                '32': torch.float32}
+            if config.model.model_type == 'selective_pt':
+                from models.selective_llm_chat import SelectLLMChat as LLMChat
+            else:
+                from models.llm_chat import LLMChat
+            if 'vicuna' in config.model.model_name and (not os.path.exists(config.model.model_name)):
+                config.model.model_name = VICUNA_PREFIX + os.sep + config.model.model_name.split(os.sep)[-1]
+            _model = LLMChat(config, batch_size)
+            left_tokenizer = _model.left_tokenizer
+            right_tokenizer = _model.right_tokenizer
+            print(f'LOADING {model_path} with {load_precision} precision')
+            model_engine, _, _, _ = deepspeed.initialize(args=cmd_args,
+                                                         model=_model,
+                                                         config=ds_config,
+                                                         )
+            model_engine.load_checkpoint(model_path, load_module_strict=False, load_optimizer_states=False,
+                                         load_lr_scheduler_states=False,
+                                         load_module_only=True)
+            valid_path_file = valid_path
+            if valid_path_file is None:
+                valid_path_file = config.dataset.valid
+            if config.dataset.test.__class__ is str:
+                valid_path_file = config.dataset.test
+                print('using train split from personachat')
+            task_type = TASK_TYPE(config.training.task_type)
+            valid_dataset = PersonaChatDataset(valid_path_file, max_context_turns=config.dataset.max_context_turns)
+            from dataset.dataset import get_dataloader
+            max_new_token = 32
+            valid_sampler = DistributedSampler(valid_dataset, num_replicas=world_size, shuffle=False,
+                                               drop_last=False)
+            valid_dataloader = get_dataloader(valid_dataset, batch_size, num_workers=0, sampler=valid_sampler)
+            context_input = []
+            persona_list = []
+            dist_pred_text = [None for _ in range(world_size)]
+            dist_gt_text = [None for _ in range(world_size)]
+            pred_text = []
+            gt_text = []
+            tqdm_iterator = tqdm(valid_dataloader, total=len(valid_dataloader))
+            selected_prompts = []
+            for data in tqdm_iterator:
+                _, text, batch_selected_prompts = LLMChat.test_step(model_engine, data, left_tokenizer,
+                                                                    right_tokenizer,
+                                                                    config, max_new_tokens=max_new_token,
+                                                                    tqdm_instance=tqdm_iterator,
+                                                                    selection_noise=selection_noise)
+                if batch_selected_prompts.__class__ != list:
+                    selected_prompts += (batch_selected_prompts.detach().cpu().tolist())
+                context_input += data['context_input']
+                persona_list += data['persona_list']
+                pred_text += text
+                gt_text += data['target']
+            clean_preds = []
+            for pred in pred_text:
+                search_result = re.search('R:|Q:|Summary:|\n|\:', pred)
+                if search_result is not None:
+                    clean_preds.append(pred[:search_result.span()[0]])
+                else:
+                    clean_preds.append(pred)
+            pred_text = clean_preds
+            dist.all_gather_object(dist_pred_text, pred_text)
+            dist.all_gather_object(dist_gt_text, gt_text)
+            pred_text = reduce(lambda x, y: x + y, dist_pred_text)
+            gt_text = reduce(lambda x, y: x + y, dist_gt_text)
+            from evaluation import bleu_score, f1_score, normalize_answer
+            bleu = bleu_score(pred_text, [gt_text])
+            import pickle
+            result = {
+                'context_input': context_input,
+                'persona_list': persona_list,
+                'pred_text': pred_text,
+                'gt_text': gt_text,
+                'bleu': bleu,
+            }
+            from collections import Counter
+            counter = Counter(selected_prompts)
+            if deepspeed.comm.get_local_rank() == 0:
+                print('bleu: ', bleu)
+                with open(save_dir, 'wb') as file:
+                    pickle.dump(result, file)
+                with open(save_dir.replace('.pkl', '.txt'), 'w', encoding='utf-8') as file:
+                    file.write('bleu: ' + str(bleu) + '\n')
+                    if len(selected_prompts) > 0:
+                        file.write('selected prompt: ' + str(counter) + '\n')
+                    for i in range(len(context_input)):
+                        if context_input[i].__class__ == list:
+                            file.write('context: ' + str(u' '.join(context_input[i]).encode('utf-8')) + '\n')
+                        else:
+                            file.write('context: ' + str(context_input[i].encode('utf-8')) + '\n')
+                        file.write('persona: ' + str(u' '.join(persona_list[i]).encode('utf-8')) + '\n')
+                        file.write('pred: ' + pred_text[i] + '\n')
+                        file.write('gt: ' + gt_text[i] + '\n')
+                        if len(selected_prompts) > 0:
+                            file.write('selected prompt: ' + str(selected_prompts[i]) + '\n')
+                        file.write('\n')
+        except Exception as e:
+            save_dir = os.sep.join(model_path.split(os.sep)[:-1]) + os.sep + "test_error.txt"
+            print(f'WRITING TESTING ERROR! ERROR: {str(e)}')
+            with open(save_dir, 'w') as file:
+                file.write(str(e))
+        deepspeed.comm.barrier()
+    deepspeed.comm.barrier()
+model_path_arg = args.model_path
+model_paths = [model_path_arg]
+if len(glob.glob(model_path_arg+os.sep+'ds_ckpt'+os.sep+'*')):
+    model_paths = [model_path_arg+os.sep+'ds_ckpt']
+elif not model_path_arg.endswith('.pth'):
+    import glob
+    path_pattern = args.path_pattern
+    if path_pattern is not None:
+        model_paths = glob.glob(f'{model_path_arg}/{path_pattern}/ds_ckpt/*/*.pt')
+    else:
+        model_paths = glob.glob(f'{model_path_arg}/*/ds_ckpt/*/*.pt')
+    model_paths = list(set([os.sep.join(p.split(os.sep)[:-2]) for p in model_paths]))
+    print(model_paths)
+num_of_gpus = torch.cuda.device_count()
+print(f"{num_of_gpus} GPUs available")
+test_process(model_paths, args.batch_size, args.valid_path,
+             args.skip_exists, args.selection_noise, cmd_args=_cmd_args)
+deepspeed.comm.barrier()
+deepspeed.comm.destroy_process_group()
+# if not model_path_arg.endswith('.pth'):
+#     evaluate_folder(model_path_arg, skip_exists=args.skip_exists)
+print('Test Ends')

train.py ADDED Viewed

	@@ -0,0 +1,129 @@

+import argparse
+import os
+from multiprocessing import freeze_support
+import deepspeed
+import torch
+from dotenv import load_dotenv
+from transformers.utils import logging
+from trainer.peft_trainer import train_generator
+from utils.config import get_config
+from utils.parser_helper import str2bool
+load_dotenv()
+logging.set_verbosity_error()
+torch.multiprocessing.set_sharing_strategy('file_system')
+torch.set_float32_matmul_precision('medium')
+def set_model_config(the_config, value, key):
+    if value is not None:
+        the_config.model[key] = value
+parser = argparse.ArgumentParser()
+parser.add_argument('--config', type=str)
+parser.add_argument('--batch', type=int, default=2)
+parser.add_argument('--lr', type=float, default=None)
+parser.add_argument('--find_batch', type=str2bool, default=False)
+parser.add_argument('--find_lr', type=str2bool, default=False)
+parser.add_argument('--bf16', type=str2bool, default=True)
+parser.add_argument('--auto_scale_batch_size', type=str2bool, default=False)
+parser.add_argument('--train_after_tune', type=str2bool, default=False)
+parser.add_argument('--num_workers', type=int, default=0)
+parser.add_argument('--epoch', type=int, default=None)
+parser.add_argument('--scheduler_patience', type=int, default=10)
+parser.add_argument('--scheduler_monitor', type=str, default='train_loss', choices=['train_loss'])
+parser.add_argument('--seed', type=int, default=3407)
+parser.add_argument('--grad_clip', type=float, default=-1)
+parser.add_argument('--save_model', type=str2bool, default=True)
+parser.add_argument('--shuffle_train', type=str2bool, default=True)
+parser.add_argument('--training_ratio', type=float, default=1.0)
+parser.add_argument('--adding_noise', type=float, default=None)
+# parser.add_argument('--retriever_type', type=str, default=None,
+#                     choices=['bert-base-uncased', 'albert-base-v2'])
+parser.add_argument('--tokenizer_parallel', type=str2bool, default=True)
+parser.add_argument('--do_test', type=str2bool, default=False)
+parser.add_argument('--exp_name', type=str, default=None)
+parser.add_argument('--mode', default=None, type=str, choices=['normal', 'causal', None])
+parser.add_argument('--local_rank', type=int, default=-1, help='local rank passed from distributed launcher')
+parser.add_argument('--selective_loss_weight', type=float, default=None)
+parser.add_argument('--contrastive_weight', type=float, default=None)
+parser.add_argument('--log_dir', type=str, default=None)
+parser.add_argument('--warmup_type', type=str, default=None)
+parser.add_argument('--warmup_min', type=float, default=0)
+parser.add_argument('--warmup_ratio', type=float, default=0.05)
+parser.add_argument('--ckpt_path', type=str, default=None)
+parser = deepspeed.add_config_arguments(parser)
+cmd_args = parser.parse_args()
+freeze_support()
+args = parser.parse_args()
+os.environ["TOKENIZERS_PARALLELISM"] = "true" if args.tokenizer_parallel else "false"
+config = get_config(args.config)
+if args.exp_name is not None:
+    config.exp_name = args.exp_name
+elif config.exp_name.__class__ != str:
+    config.exp_name = args.config.split(os.sep)[-1][:-4]
+if args.lr is not None:
+    config.exp_name += f'_LR={args.lr}'
+if args.selective_loss_weight is not None:
+    config.training.selective_loss_weight = args.selective_loss_weight
+    config.exp_name += f'_SLW={args.selective_loss_weight}'
+if args.contrastive_weight is not None:
+    config.training.contrastive_weight = args.contrastive_weight
+    config.exp_name += f'_CTRW={args.contrastive_weight}'
+if args.adding_noise is not None:
+    config.training.adding_noise = args.adding_noise
+    config.exp_name += f'_NOISE={args.adding_noise}'
+if args.training_ratio < 1.0:
+    config.exp_name += f'_PTRAIN={args.training_ratio}'
+# Done model config
+generator_type = config.model.generator_type
+if args.mode is not None:
+    config.training.mode = args.mode
+if args.epoch is not None:
+    config.training.num_epoch = args.epoch
+epoch = config.training.num_epoch
+if args.log_dir is not None:
+    config.training.log_dir = args.log_dir
+if 'llama-2' in config.model.model_name.lower():
+    folder_name = config.model.model_name.split('/')[-1]
+    config.model.model_name = os.getenv('LLAMA2_PATH')+'/'+folder_name
+warmup_config = None
+if args.warmup_type is not None:
+    warmup_config = {
+        "type": args.warmup_type,
+        "params": {
+            # "warmup_min_lr": args.warmup_min,
+            # "warmup_max_lr": args.lr,
+            "warmup_ratio": args.warmup_ratio
+        }
+    }
+    config.exp_name += f'_WP={args.warmup_type}@{args.warmup_ratio}'
+if __name__ == '__main__':
+    train_generator(config, args.batch, args.lr, args.num_workers,
+                                    epoch, args.grad_clip, args.seed, args.save_model,
+                                    args.training_ratio, cmd_args=cmd_args, shuffle_train=args.shuffle_train,
+                    warmup_config=warmup_config, ckpt_path=args.ckpt_path)
+    # num_of_gpus = torch.cuda.device_count()
+    # print(f"{num_of_gpus} GPUs available")
+    # mp.spawn(train_generator, args=(config, args.batch, args.lr, args.num_workers,
+    #                                 epoch, args.grad_clip, num_of_gpus, args.seed, args.save_model,
+    #                                 args.training_ratio), nprocs=num_of_gpus)
+# train_generator(args.local_rank, config,
+#                 batch_size=args.batch,
+#                 lr=args.lr,
+#                 num_workers=args.num_workers,
+#                 epoch=args.epoch,
+#                 gradient_clipping=args.grad_clip)

trainer/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

trainer/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (149 Bytes). View file

trainer/__pycache__/peft_trainer.cpython-310.pyc ADDED Viewed

Binary file (5.85 kB). View file

trainer/peft_trainer.py ADDED Viewed

	@@ -0,0 +1,187 @@

+import json
+import os
+import time
+import deepspeed
+import torch
+from pytictoc import TicToc
+from torch.utils.data import DistributedSampler
+from torch.utils.tensorboard import SummaryWriter
+from tqdm import tqdm
+from dataset.dataset import PersonaChatDataset
+from utils.dist_helper import setup
+from utils.format_inputs import TASK_TYPE
+from utils.seed_everything import seed_everything
+def save_checkpoint(model, optimizer, config, filename):
+    torch.save({
+        # 'model_state_dict': model.module.state_dict(),
+        # 'optimizer_state_dict': optimizer.state_dict(),
+        'config': config
+    }, filename)
+def train_generator(config, batch_size, lr,
+                    num_workers,
+                    epoch,
+                    gradient_clipping, seed, save_model,
+                    training_ratio, cmd_args, shuffle_train=True,warmup_config=None,
+                    ckpt_path=None):
+    with open(cmd_args.deepspeed_config) as json_file:
+        ds_config = json.load(json_file)
+        del cmd_args.deepspeed_config
+        ds_config['train_micro_batch_size_per_gpu'] = batch_size
+        ds_config['optimizer']['params']['lr'] = lr
+        if config.model.load_bit == 16:
+            ds_config['float16']['enabled'] = True
+        if config.model.load_bit == 'bf16':
+            ds_config['bf16']['enabled'] = True
+        if gradient_clipping > 0:
+            ds_config['gradient_clipping'] = gradient_clipping
+    world_size = int(os.getenv("WORLD_SIZE", "1"))
+    if config.model.model_type == 'selective_pt':
+        from models.selective_llm_chat import SelectLLMChat as LLMChat
+    else:
+        from models.llm_chat import LLMChat
+    seed_everything(seed)
+    # initialize the distributed environment
+    # time setup function using tictoc
+    t = TicToc()
+    t.tic()
+    setup()
+    # print(f"Time for setup is {t.tocvalue()} seconds")
+    config.training.learning_rate = float(lr)
+    # Create model and move it to GPU
+    task_type: str = config.training.task_type
+    enum_task = TASK_TYPE(task_type)
+    train_dataset = PersonaChatDataset(config.dataset.train, max_context_turns=config.dataset.max_context_turns,
+                                       training_ratio=training_ratio,
+                                       only_longest=config.training.only_longest,
+                                       task_type=enum_task)
+    valid_dataset = PersonaChatDataset(config.dataset.valid, max_context_turns=config.dataset.max_context_turns,
+                                       task_type=enum_task)
+    from dataset.dataset import get_dataloader
+    if warmup_config is not None:
+        warmup_config["params"]['warmup_num_steps'] = int(len(train_dataset)/batch_size * warmup_config["params"]['warmup_ratio'] / world_size)
+        warmup_config["params"]['warmup_num_steps'] = int(len(train_dataset)/batch_size * warmup_config["params"]['warmup_ratio'] / world_size)
+        warmup_config["params"]['total_num_steps'] = int(len(train_dataset)/batch_size)/world_size
+        del warmup_config["params"]['warmup_ratio']
+        ds_config['scheduler'] = warmup_config
+    _pt_model = LLMChat(config, batch_size=batch_size, ds_config=ds_config)
+    # ddp_model = DDP(_pt_model, device_ids=[0], output_device=0, find_unused_parameters=False)
+    left_tokenizer = _pt_model.left_tokenizer
+    right_tokenizer = _pt_model.right_tokenizer
+    # So there are always training samples
+    right_tokenizer.truncation_side = 'left'
+    # If it is lengthy, cut the right side
+    left_tokenizer.truncation_side = 'right'
+    # Create distributed sampler
+    all_params = [p for p in _pt_model.parameters()]
+    require_grads = [p for p in all_params if p.requires_grad]
+    model_engine, optimizer, train_dataloader, _ = deepspeed.initialize(args=cmd_args,
+                                                                        model=_pt_model,
+                                                                        model_parameters=require_grads,
+                                                                        training_data=train_dataset,
+                                                                        config=ds_config,
+                                                                        )
+    if ckpt_path is not None:
+        model_engine.load_checkpoint(ckpt_path, load_module_strict=False, load_optimizer_states=True,
+                                         load_lr_scheduler_states=True,
+                                         load_module_only=False)
+    valid_sampler = DistributedSampler(valid_dataset, num_replicas=world_size, shuffle=False,
+                                       drop_last=False)
+    valid_dataloader = get_dataloader(valid_dataset, batch_size, shuffle=False, num_workers=num_workers,
+                                      sampler=valid_sampler)
+    if enum_task in [TASK_TYPE.GENERATE_RESPONSE, TASK_TYPE.GENERATE_PERSONA]:
+        train_sampler = DistributedSampler(train_dataset, num_replicas=world_size, shuffle=shuffle_train,
+                                        drop_last=False)
+        train_dataloader = get_dataloader(train_dataset, batch_size, shuffle=False, num_workers=num_workers,
+                                        sampler=train_sampler)
+    # You might want to adjust this depending on your specific requirements
+    # scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
+    if config.training.log_dir.__class__ is str:
+        logdir = f"{config.training.log_dir}/{config.exp_name}_{time.strftime('%Y-%m-%d-%H%M')}"
+    else:
+        logdir = f"runs/{config.exp_name}_{time.strftime('%Y-%m-%d-%H%M')}"
+    # Tensorboard logger
+    writer = SummaryWriter(log_dir=logdir)
+    best_valid_loss = 65535
+    # Training Loop
+    counter = 0
+    valid_counter = 0
+    for _epoch in range(epoch):
+        model_engine.train()
+        total_loss = 0.0
+        gathered_train_loss = [torch.zeros(1, dtype=torch.float32, device=model_engine.device) for _ in range(world_size)]
+        train_iter = tqdm(train_dataloader, total=len(train_dataloader), desc=f'epoch: {_epoch}')
+        total_steps_per_epoch = len(train_dataloader)
+        total_steps = total_steps_per_epoch*epoch
+        for idx, inputs in enumerate(train_iter):
+            current_step = idx+_epoch*total_steps_per_epoch
+            current_training_percent = current_step/total_steps
+            model_engine.zero_grad()
+            loss = LLMChat.training_step(model_engine, inputs, left_tokenizer, right_tokenizer, config,
+                                          mode=config.training.mode, task_type=enum_task, training_process=current_training_percent)
+            skipped = False
+            params = []
+            if deepspeed.comm.get_local_rank() in [-1, 0]:
+                for n, p in model_engine.named_parameters():
+                    if p.requires_grad:
+                        params.append(p)
+                norm = torch.stack([p.norm() for p in params]).sum()
+                print(f'NORM: {norm}')
+            if loss.isnan():
+               model_engine.backward(loss.new_zeros(loss.shape, requires_grad=True))
+               skipped = True
+               print(inputs)
+               raise ValueError('Meet NaN in training!')
+            else:
+                model_engine.backward(loss)
+                if gradient_clipping > 0:
+                    model_engine.gradient_clipping()
+            model_engine.step()
+            total_loss += loss.item()
+            writer.add_scalar(f'Loss-{deepspeed.comm.get_local_rank()}/train', loss.item(), counter)
+            counter += 1
+            train_iter.set_postfix_str(f'loss: {loss.item()}'+(" (Skipped)" if skipped else ""))
+        outputs_valid_losses = [torch.zeros(1, dtype=torch.float32, device=model_engine.device) for _ in range(world_size)]
+        valid_loss = []
+        for inputs in tqdm(valid_dataloader, total=len(valid_dataloader), desc='valid'):
+            model_engine.eval()
+            with torch.no_grad():
+                loss = LLMChat.validation_step(model_engine, inputs, left_tokenizer, right_tokenizer, config,
+                                               mode=config.training.mode, task_type=enum_task)
+                valid_loss.append(loss.item())
+                writer.add_scalar(f'Loss-{deepspeed.comm.get_local_rank()}/valid', loss.item(), valid_counter)
+                valid_counter += 1
+        deepspeed.comm.all_gather(outputs_valid_losses, torch.tensor(valid_loss).mean().to(model_engine.device))
+        gathered_valid_loss = torch.stack(outputs_valid_losses).mean()
+        deepspeed.comm.all_gather(gathered_train_loss, torch.tensor(total_loss / len(train_dataloader), device=model_engine.device))
+        writer.add_scalar(f'Loss-{deepspeed.comm.get_local_rank()}/total_train',   torch.stack(gathered_train_loss).mean(), _epoch)
+        writer.add_scalar(f'Loss-{deepspeed.comm.get_local_rank()}/total_valid', gathered_valid_loss, _epoch)
+        deepspeed.comm.barrier()
+        print(
+            f'\nepoch: {_epoch}, train_loss: {total_loss / len(train_dataloader)}, valid_loss: {gathered_valid_loss}\n')
+        if best_valid_loss > gathered_valid_loss and save_model:
+            # Save pt_model checkpoint
+            if model_engine.global_rank == 0:
+                print(f"Saving model checkpoint with valid loss {gathered_valid_loss}")
+                save_checkpoint(model_engine, optimizer, config, f'{logdir}/checkpoint_best.pth')
+            model_engine.save_checkpoint(f'{logdir}/ds_ckpt', tag='best', exclude_frozen_parameters=True)
+            best_valid_loss = gathered_valid_loss
+    deepspeed.comm.destroy_process_group()

utils/__pycache__/config.cpython-310.pyc ADDED Viewed

Binary file (1.36 kB). View file

utils/__pycache__/configure_optimizers.cpython-310.pyc ADDED Viewed

Binary file (354 Bytes). View file

utils/__pycache__/dist_helper.cpython-310.pyc ADDED Viewed

Binary file (292 Bytes). View file

utils/__pycache__/format_inputs.cpython-310.pyc ADDED Viewed

Binary file (6.43 kB). View file

utils/__pycache__/model_helpers.cpython-310.pyc ADDED Viewed

Binary file (948 Bytes). View file

utils/__pycache__/parser_helper.cpython-310.pyc ADDED Viewed

Binary file (542 Bytes). View file

utils/__pycache__/seed_everything.cpython-310.pyc ADDED Viewed

Binary file (1.78 kB). View file

utils/config.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import yaml
+from dotmap import DotMap
+def extend_dict(extend_me, extend_by):
+    if isinstance(extend_me, dict):
+        for k, v in extend_by.iteritems():
+            if k in extend_me:
+                extend_dict(extend_me[k], v)
+            else:
+                extend_me[k] = v
+    else:
+        if isinstance(extend_me, list):
+            extend_list(extend_me, extend_by)
+        else:
+            if extend_by is not None:
+                extend_me += extend_by
+def extend_list(extend_me, extend_by):
+    missing = []
+    for item1 in extend_me:
+        if not isinstance(item1, dict):
+            continue
+        for item2 in extend_by:
+            if not isinstance(item2, dict) or item2 in missing:
+                continue
+            extend_dict(item1, item2)
+def extend_compatibility_for_gated_transformer(configuration):
+    dict_config = configuration.toDict()
+    return configuration
+def get_config(path):
+    with open(path, 'r') as file:
+        configuration = yaml.load(file, Loader=yaml.FullLoader)
+    with open('config/default.yml', 'r') as file:
+        base_configuration = yaml.load(file, Loader=yaml.FullLoader)
+    configuration = DotMap(configuration)
+    base_configuration = DotMap(base_configuration)
+    extend_dict(configuration, base_configuration)
+    configuration = extend_compatibility_for_gated_transformer(configuration)
+    return configuration
+if __name__ == '__main__':
+    config = get_config('config/bert-base.yml')

utils/configure_optimizers.py ADDED Viewed

	@@ -0,0 +1,6 @@

+import torch
+def configure_optimizers(model, lr):
+    adam = torch.optim.Adam(model.parameters(), lr=lr)
+    return adam

utils/dist_helper.py ADDED Viewed

	@@ -0,0 +1,5 @@

+import deepspeed
+def setup():
+    deepspeed.init_distributed()

utils/format_inputs.py ADDED Viewed

	@@ -0,0 +1,173 @@

+from enum import Enum
+import torch
+class TASK_TYPE(Enum):
+    GENERATE_RESPONSE = 'generate_response'
+    GENERATE_PERSONA = 'generate_persona'
+def format_personachat_input(batch, left_tokenizer, right_tokenizer, config, for_test=False, find_batch=False):
+    batch_size = len(batch['context_input'])
+    pad_token_id = left_tokenizer.pad_token_id
+    targets = [t.strip() for t in batch['target']]
+    eos_token = left_tokenizer.eos_token
+    concat_context = [' '.join(context) for context in batch['context_input']]
+    concat_persona = [' '.join(persona) for persona in batch['persona_list']]
+    concat_input = [f'#persona#{persona}#context#{context}' for persona, context in
+                    zip(concat_persona, concat_context)]
+    inference_tokenized = None
+    bos_token = left_tokenizer.bos_token
+    if for_test:
+        inference_input = [f'#persona#{persona}#context#{context}{bos_token}' for persona, context in
+                           zip(concat_persona, concat_context)]
+        inference_tokenized = left_tokenizer(inference_input, add_special_tokens=False, return_tensors='pt',
+                                             padding='max_length', truncation=True,
+                                             max_length=config.dataset.max_token_length - 16)
+    # processing target
+    _target_with_bos = [f'{bos_token}{target}{eos_token}' for target in targets]
+    _target_with_bos_pt = right_tokenizer(_target_with_bos,
+                                          add_special_tokens=False, return_tensors='pt', \
+                                          padding=True)
+    _target_pt = _target_with_bos_pt.copy()
+    _target_pt['input_ids'] = torch.cat((_target_pt['input_ids'][:, 1:],
+                                         _target_pt['input_ids'].new_ones(batch_size, 1) * pad_token_id), dim=1)
+    _target_pt['attention_mask'] = torch.cat((_target_pt['attention_mask'][:, 1:],
+                                              _target_pt['attention_mask'].new_zeros(batch_size, 1)), dim=1)
+    # processing concat
+    context_pt = left_tokenizer(concat_input, add_special_tokens=False, return_tensors='pt',
+                                padding='max_length', truncation=True,
+                                max_length=config.dataset.max_token_length)
+    input_pt = torch.cat((context_pt['input_ids'], _target_with_bos_pt['input_ids']),
+                         dim=1)[:, -config.dataset.max_token_length:]
+    input_attn = torch.cat((context_pt['attention_mask'], _target_with_bos_pt['attention_mask']),
+                           dim=1)[:, -config.dataset.max_token_length:]
+    lm_input = {'input_ids': input_pt, 'attention_mask': input_attn}
+    if find_batch:
+        lm_target = torch.cat((context_pt['input_ids'],
+                               _target_pt['input_ids']), dim=1)[:, -config.dataset.max_token_length:]
+    else:
+        lm_target = torch.cat((context_pt['input_ids'] * 0 - 1,
+                               _target_pt['input_ids']), dim=1)[:, -config.dataset.max_token_length:]
+    if for_test:
+        return lm_input, lm_target, inference_tokenized
+    return lm_input, lm_target
+# Template Type:
+# 0: </s>
+def format_causal_personachat_input(batch, left_tokenizer, right_tokenizer, config, for_test=False,
+                                    find_batch=False, template_type=0):
+    template_types = [
+        '{cinput} R: {target}',
+        '{cinput} R: [COMPLETE] the answer for [COMPLETE] is {target}'
+    ]
+    bos_token = left_tokenizer.bos_token
+    eos_token = left_tokenizer.eos_token
+    batch_size = len(batch['context_input'])
+    pad_token_id = right_tokenizer.pad_token_id
+    targets = [t.strip() for t in batch['target']]
+    concat_context = [' '.join(context) for context in batch['context_input']]
+    concat_persona = [' '.join(persona) for persona in batch['persona_list']]
+    concat_input = [f'given persona: {persona}; context: {context}' for persona, context in
+                    zip(concat_persona, concat_context)]
+    concat_input_target = [template_types[template_type].format(cinput=cinput, target=target) for cinput, target in
+                           zip(concat_input, targets)]
+    bos_concat_input = [f'{bos_token}{cinput}{eos_token}' for cinput in concat_input_target]
+    lm_input = right_tokenizer(bos_concat_input, add_special_tokens=False, return_tensors='pt',
+                               padding='max_length', truncation=True,
+                               max_length=config.dataset.max_token_length)
+    lm_target = lm_input.copy()
+    lm_target = torch.cat((lm_target['input_ids'][:, 1:], lm_target['input_ids'].new_full(
+        (batch_size, 1), pad_token_id)), dim=1)
+    # lm_target['attention_mask'] = torch.cat(
+    #     (lm_target['attention_mask'][:, 1:], lm_target['attention_mask'].new_full(
+    #         (batch_size, 1), 0)), dim=1)
+    # freeze persona
+    if config.training.freeze_persona.__class__ is bool and config.training.freeze_persona:
+        for _lm_target in lm_target:
+            if 'given persona:' not in left_tokenizer.decode(_lm_target):
+                continue
+            _tokens = left_tokenizer.convert_ids_to_tokens(_lm_target)
+            _token_ids = _lm_target
+            _token_idx = None
+            for idx in range(0, len(_tokens) - 1):
+                if _tokens[idx].endswith('context') and _tokens[idx + 1].endswith(':'):
+                    _token_idx = idx
+                    break
+                _token_ids[idx] = left_tokenizer.pad_token_id
+    # freeze context
+    if config.training.freeze_context.__class__ is bool and config.training.freeze_context:
+        for _lm_target in lm_target:
+            _tokens = left_tokenizer.convert_ids_to_tokens(_lm_target)
+            _token_ids = _lm_target
+            _start_idx = None
+            _end_idx = None
+            for idx in range(0, len(_tokens) - 1):
+                if _tokens[idx].endswith('context') and _tokens[idx + 1].endswith(':'):
+                    _start_idx = idx
+                if _tokens[idx].endswith('R') and _tokens[idx + 1].endswith(':'):
+                    _end_idx = idx + 2
+            if _start_idx is None or _end_idx is None:
+                continue
+            for idx in range(_start_idx, _end_idx):
+                _token_ids[idx] = left_tokenizer.pad_token_id
+    if for_test:
+        inference_input = [template_types[template_type].format(cinput=cinput, target='') for cinput in concat_input]
+        bos_concat_input = [f'{bos_token}{cinput}' for cinput in inference_input]
+        inference_tokenized = left_tokenizer(bos_concat_input, add_special_tokens=False
+                                             , return_tensors='pt',
+                                             padding=True, truncation=True,
+                                             max_length=config.dataset.max_token_length)
+        return lm_input, lm_target, inference_tokenized
+    return lm_input, lm_target
+def format_generate_persona_input(batch, left_tokenizer, right_tokenizer, config, for_test=False, find_batch=False):
+    batch_size = len(batch['context_input'])
+    pad_token_id = left_tokenizer.pad_token_id
+    targets = [' '.join(persona) for persona in batch['persona_list']]
+    eos_token = left_tokenizer.eos_token
+    concat_context = [' '.join(context) for context in batch['context_input']]
+    concat_input = [f'#context#{context}' for context in
+                    concat_context]
+    inference_tokenized = None
+    bos_token = left_tokenizer.bos_token
+    if for_test:
+        inference_input = [f'#context#{context}{bos_token}' for context in
+                           concat_context]
+        inference_tokenized = left_tokenizer(inference_input, add_special_tokens=False, return_tensors='pt',
+                                             padding='max_length', truncation=True,
+                                             max_length=config.dataset.max_token_length - 16)
+    # processing target
+    _target_with_bos = [f'{bos_token}{target}{eos_token}' for target in targets]
+    _target_with_bos_pt = right_tokenizer(_target_with_bos,
+                                          add_special_tokens=False, return_tensors='pt',
+                                          padding=True)
+    _target_pt = _target_with_bos_pt.copy()
+    _target_pt['input_ids'] = torch.cat((_target_pt['input_ids'][:, 1:],
+                                         _target_pt['input_ids'].new_ones(batch_size, 1) * pad_token_id), dim=1)
+    _target_pt['attention_mask'] = torch.cat((_target_pt['attention_mask'][:, 1:],
+                                              _target_pt['attention_mask'].new_zeros(batch_size, 1)), dim=1)
+    # processing concat
+    context_pt = left_tokenizer(concat_input, add_special_tokens=False, return_tensors='pt',
+                                padding='max_length', truncation=True,
+                                max_length=config.dataset.max_token_length)
+    input_pt = torch.cat((context_pt['input_ids'], _target_with_bos_pt['input_ids']),
+                         dim=1)[:, -config.dataset.max_token_length:]
+    input_attn = torch.cat((context_pt['attention_mask'], _target_with_bos_pt['attention_mask']),
+                           dim=1)[:, -config.dataset.max_token_length:]
+    lm_input = {'input_ids': input_pt, 'attention_mask': input_attn}
+    if find_batch:
+        lm_target = torch.cat((context_pt['input_ids'],
+                               _target_pt['input_ids']), dim=1)[:, -config.dataset.max_token_length:]
+    else:
+        lm_target = torch.cat((context_pt['input_ids'] * 0 - 1,
+                               _target_pt['input_ids']), dim=1)[:, -config.dataset.max_token_length:]
+    if for_test:
+        return lm_input, lm_target, inference_tokenized
+    return lm_input, lm_target

utils/model_helpers.py ADDED Viewed

	@@ -0,0 +1,31 @@

+def print_trainable_parameters(model):
+    """
+    Prints the number of trainable parameters in the model.
+    """
+    trainable_params = 0
+    all_param = 0
+    all_param_names = []
+    trainable_param_names = []
+    prompt_weights = 0
+    prompt_normalizer = 0
+    prompt_normalizer_layer = []
+    soft_prompt_layers = []
+    for name, param in model.named_parameters():
+        all_param += param.numel()
+        all_param_names.append(name)
+        if param.requires_grad:
+            print(name)
+            if 'prompt_encoder.default.embedding' in name:
+                prompt_weights+= param.numel()
+                soft_prompt_layers.append(param)
+            if 'prompt_normalizer' in name:
+                prompt_normalizer += param.numel()
+                prompt_normalizer_layer.append(param)
+            trainable_params += param.numel()
+            trainable_param_names.append(name)
+    print(
+        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
+    )
+    return {"trainable": trainable_params, "all": all_param, "trainable%": 100 * trainable_params / all_param}

utils/parser_helper.py ADDED Viewed

	@@ -0,0 +1,17 @@

+import argparse
+def str2bool(v):
+    if v is None:
+        return None
+    exclusive = ['accurate', 'query', 'document']
+    if v in exclusive:
+        return v
+    if isinstance(v, bool):
+        return v
+    if v.lower() in ('yes', 'true', 't', 'y', '1'):
+        return True
+    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
+        return False
+    else:
+        raise argparse.ArgumentTypeError('Boolean value expected.')

utils/seed_everything.py ADDED Viewed

	@@ -0,0 +1,44 @@

+import os
+import random
+from typing import Optional
+import numpy as np
+import torch
+max_seed_value = np.iinfo(np.uint32).max
+min_seed_value = np.iinfo(np.uint32).min
+def seed_everything(seed: Optional[int], workers: bool = False) -> int:
+    """Function that sets seed for pseudo-random number generators in: pytorch, numpy, python.random In addition,
+    sets the following environment variables:
+    - `PL_GLOBAL_SEED`: will be passed to spawned subprocesses (e.g. ddp_spawn backend).
+    - `PL_SEED_WORKERS`: (optional) is set to 1 if ``workers=True``.
+    Args:
+        seed: the integer value seed for global random state in Lightning.
+            If `None`, will read seed from `PL_GLOBAL_SEED` env variable
+            or select it randomly.
+        workers: if set to ``True``, will properly configure all dataloaders passed to the
+            Trainer with a ``worker_init_fn``. If the user already provides such a function
+            for their dataloaders, setting this argument will have no influence. See also:
+            :func:`~lightning_fabric.utilities.seed.pl_worker_init_function`.
+    """
+    if not isinstance(seed, int):
+        seed = int(seed)
+    if not (min_seed_value <= seed <= max_seed_value):
+        raise ValueError(f"{seed} is not in bounds, numpy accepts from {min_seed_value} to {max_seed_value}")
+    print(f"Global seed set to {seed}")
+    os.environ["PL_GLOBAL_SEED"] = str(seed)
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    os.environ["PL_SEED_WORKERS"] = f"{int(workers)}"
+    return seed