bupt
/

chatglm3-6b-32k-wenshu-finetuned

Model card Files Files and versions Community

KLGR123 commited on Nov 13, 2023

Commit

5f1c52b

1 Parent(s): 336ff50

commit message

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

202311100214.log +0 -0
README copy.md +58 -0
adapter_config.json +22 -0
adapter_model.bin +3 -0
added_tokens.json +4 -0
all_results.json +7 -0
checkpoint-1000/README.md +207 -0
checkpoint-1000/adapter_config.json +22 -0
checkpoint-1000/adapter_model.bin +3 -0
checkpoint-1000/added_tokens.json +4 -0
checkpoint-1000/optimizer.pt +3 -0
checkpoint-1000/rng_state_0.pth +3 -0
checkpoint-1000/rng_state_1.pth +3 -0
checkpoint-1000/rng_state_2.pth +3 -0
checkpoint-1000/rng_state_3.pth +3 -0
checkpoint-1000/rng_state_4.pth +3 -0
checkpoint-1000/rng_state_5.pth +3 -0
checkpoint-1000/rng_state_6.pth +3 -0
checkpoint-1000/rng_state_7.pth +3 -0
checkpoint-1000/scheduler.pt +3 -0
checkpoint-1000/special_tokens_map.json +6 -0
checkpoint-1000/tokenization_chatglm.py +283 -0
checkpoint-1000/tokenizer.model +3 -0
checkpoint-1000/tokenizer_config.json +38 -0
checkpoint-1000/trainer_state.json +619 -0
checkpoint-1000/training_args.bin +3 -0
checkpoint-1200/README.md +207 -0
checkpoint-1200/adapter_config.json +22 -0
checkpoint-1200/adapter_model.bin +3 -0
checkpoint-1200/added_tokens.json +4 -0
checkpoint-1200/optimizer.pt +3 -0
checkpoint-1200/rng_state_0.pth +3 -0
checkpoint-1200/rng_state_1.pth +3 -0
checkpoint-1200/rng_state_2.pth +3 -0
checkpoint-1200/rng_state_3.pth +3 -0
checkpoint-1200/rng_state_4.pth +3 -0
checkpoint-1200/rng_state_5.pth +3 -0
checkpoint-1200/rng_state_6.pth +3 -0
checkpoint-1200/rng_state_7.pth +3 -0
checkpoint-1200/scheduler.pt +3 -0
checkpoint-1200/special_tokens_map.json +6 -0
checkpoint-1200/tokenization_chatglm.py +283 -0
checkpoint-1200/tokenizer.model +3 -0
checkpoint-1200/tokenizer_config.json +38 -0
checkpoint-1200/trainer_state.json +739 -0
checkpoint-1200/training_args.bin +3 -0
checkpoint-1400/README.md +207 -0
checkpoint-1400/adapter_config.json +22 -0
checkpoint-1400/adapter_model.bin +3 -0
checkpoint-1400/added_tokens.json +4 -0

202311100214.log ADDED Viewed

The diff for this file is too large to render. See raw diff

README copy.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+base_model: /home/hz/projects/chatglm3-6b-32k
+tags:
+- llama-factory
+- lora
+- generated_from_trainer
+model-index:
+- name: chatglm3-6b-32k-wenshu-finetuned
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# chatglm3-6b-32k-wenshu-finetuned
+This model is a fine-tuned version of [/home/hz/projects/chatglm3-6b-32k](https://huggingface.co//home/hz/projects/chatglm3-6b-32k) on the wenshu_dataset dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 256
+- total_eval_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- num_epochs: 3.0
+### Training results
+### Framework versions
+- Transformers 4.34.0
+- Pytorch 2.0.1+cu117
+- Datasets 2.14.6
+- Tokenizers 0.14.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/hz/projects/chatglm3-6b-32k",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 32.0,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c3346e13155c0d39280e75d07fe63bd525777020def5c6512c3907aaea14da10
+size 7820185

added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|observation|>": 64797,
+  "<|user|>": 64795
+}

all_results.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+    "epoch": 3.0,
+    "train_loss": 0.40742741023141216,
+    "train_runtime": 104521.3326,
+    "train_samples_per_second": 10.964,
+    "train_steps_per_second": 0.043
+}

checkpoint-1000/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+library_name: peft
+base_model: /home/hz/projects/chatglm3-6b-32k
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Data Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+## Training procedure
+### Framework versions
+- PEFT 0.6.1

checkpoint-1000/adapter_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/hz/projects/chatglm3-6b-32k",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 32.0,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value"
+  ],
+  "task_type": "CAUSAL_LM"
+}

checkpoint-1000/adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3342fe0de43a1853b1fdaad882567a452ec747c7bb392a7e5e2a88c0a939cc11
+size 7820185

checkpoint-1000/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|observation|>": 64797,
+  "<|user|>": 64795
+}

checkpoint-1000/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3badd333c7c2ea3bb2494882b036831b3692efddb08fe380c12ae793f7d5d63
+size 15644485

checkpoint-1000/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1060624c7954c3b286c1948a1dd5e1ce39c497aee826b7f77f55576e5309b4c3
+size 21687

checkpoint-1000/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28602630f9bf652c63fca7559b6f34e3236d16cb19a88e14b1ae9abc3f89b7c6
+size 21687

checkpoint-1000/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:331d1c16c6e5f215989b0d4f6f031cadc6c60d030a502ce8c93d000b402b8ad4
+size 21687

checkpoint-1000/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:36b8b18dff0d9c7fa865aa16e2d89c59d88fad7d0bc2a1589c8a7cd422051ac8
+size 21687

checkpoint-1000/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e607493ac32d6d104991335c8909b9165d6475b6001cfd544a15a014fb21aaef
+size 21687

checkpoint-1000/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f91e8dbba9f0531a6db2b77df4458c926e4f57aa448d6fc5ef1918429d742736
+size 21687

checkpoint-1000/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34901406783e82c1f339065211640375722456aa206f399ee541b75f44a6a3a1
+size 21687

checkpoint-1000/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22decdcd459d4cbd6fb83a5afd8d9e6edec7ee066069d318782fde025ca4c4de
+size 21687

checkpoint-1000/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:707b10eb98685e357773ca2125e0d6c1c1de2a1c4e7ededd34ea00989b0b159a
+size 627

checkpoint-1000/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "additional_special_tokens": [
+    "<|user|>",
+    "<|observation|>"
+  ]
+}

checkpoint-1000/tokenization_chatglm.py ADDED Viewed

	@@ -0,0 +1,283 @@

+import json
+import os
+import torch
+from typing import List, Optional, Union, Dict
+from sentencepiece import SentencePieceProcessor
+from transformers import PreTrainedTokenizer
+from transformers.utils import logging, PaddingStrategy
+from transformers.tokenization_utils_base import EncodedInput, BatchEncoding
+class SPTokenizer:
+    def __init__(self, model_path: str):
+        # reload tokenizer
+        assert os.path.isfile(model_path), model_path
+        self.sp_model = SentencePieceProcessor(model_file=model_path)
+        # BOS / EOS token IDs
+        self.n_words: int = self.sp_model.vocab_size()
+        self.bos_id: int = self.sp_model.bos_id()
+        self.eos_id: int = self.sp_model.eos_id()
+        self.pad_id: int = self.sp_model.unk_id()
+        assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()
+        special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "sop", "eop", "<|system|>", "<|user|>", "<|assistant|>",
+                          "<|observation|>"]
+        self.special_tokens = {}
+        self.index_special_tokens = {}
+        for token in special_tokens:
+            self.special_tokens[token] = self.n_words
+            self.index_special_tokens[self.n_words] = token
+            self.n_words += 1
+    def tokenize(self, s: str):
+        return self.sp_model.EncodeAsPieces(s)
+    def encode(self, s: str, bos: bool = False, eos: bool = False) -> List[int]:
+        assert type(s) is str
+        t = self.sp_model.encode(s)
+        if bos:
+            t = [self.bos_id] + t
+        if eos:
+            t = t + [self.eos_id]
+        return t
+    def decode(self, t: List[int]) -> str:
+        text, buffer = "", []
+        for token in t:
+            if token in self.index_special_tokens:
+                if buffer:
+                    text += self.sp_model.decode(buffer)
+                    buffer = []
+                text += self.index_special_tokens[token]
+            else:
+                buffer.append(token)
+        if buffer:
+            text += self.sp_model.decode(buffer)
+        return text
+    def decode_tokens(self, tokens: List[str]) -> str:
+        text = self.sp_model.DecodePieces(tokens)
+        return text
+    def convert_token_to_id(self, token):
+        """ Converts a token (str) in an id using the vocab. """
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        return self.sp_model.PieceToId(token)
+    def convert_id_to_token(self, index):
+        """Converts an index (integer) in a token (str) using the vocab."""
+        if index in self.index_special_tokens:
+            return self.index_special_tokens[index]
+        if index in [self.eos_id, self.bos_id, self.pad_id] or index < 0:
+            return ""
+        return self.sp_model.IdToPiece(index)
+class ChatGLMTokenizer(PreTrainedTokenizer):
+    vocab_files_names = {"vocab_file": "tokenizer.model"}
+    model_input_names = ["input_ids", "attention_mask", "position_ids"]
+    def __init__(self, vocab_file, padding_side="left", clean_up_tokenization_spaces=False, **kwargs):
+        self.name = "GLMTokenizer"
+        self.vocab_file = vocab_file
+        self.tokenizer = SPTokenizer(vocab_file)
+        self.special_tokens = {
+            "<bos>": self.tokenizer.bos_id,
+            "<eos>": self.tokenizer.eos_id,
+            "<pad>": self.tokenizer.pad_id
+        }
+        super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces, **kwargs)
+    def get_command(self, token):
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        assert token in self.tokenizer.special_tokens, f"{token} is not a special token for {self.name}"
+        return self.tokenizer.special_tokens[token]
+    @property
+    def unk_token(self) -> str:
+        return "<unk>"
+    @property
+    def pad_token(self) -> str:
+        return "<unk>"
+    @property
+    def pad_token_id(self):
+        return self.get_command("<pad>")
+    @property
+    def eos_token(self) -> str:
+        return "</s>"
+    @property
+    def eos_token_id(self):
+        return self.get_command("<eos>")
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_words
+    def get_vocab(self):
+        """ Returns vocab as a dict """
+        vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
+        vocab.update(self.added_tokens_encoder)
+        return vocab
+    def _tokenize(self, text, **kwargs):
+        return self.tokenizer.tokenize(text)
+    def _convert_token_to_id(self, token):
+        """ Converts a token (str) in an id using the vocab. """
+        return self.tokenizer.convert_token_to_id(token)
+    def _convert_id_to_token(self, index):
+        """Converts an index (integer) in a token (str) using the vocab."""
+        return self.tokenizer.convert_id_to_token(index)
+    def convert_tokens_to_string(self, tokens: List[str]) -> str:
+        return self.tokenizer.decode_tokens(tokens)
+    def save_vocabulary(self, save_directory, filename_prefix=None):
+        """
+        Save the vocabulary and special tokens file to a directory.
+        Args:
+            save_directory (`str`):
+                The directory in which to save the vocabulary.
+            filename_prefix (`str`, *optional*):
+                An optional prefix to add to the named of the saved files.
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        if os.path.isdir(save_directory):
+            vocab_file = os.path.join(
+                save_directory, self.vocab_files_names["vocab_file"]
+            )
+        else:
+            vocab_file = save_directory
+        with open(self.vocab_file, 'rb') as fin:
+            proto_str = fin.read()
+        with open(vocab_file, "wb") as writer:
+            writer.write(proto_str)
+        return (vocab_file,)
+    def get_prefix_tokens(self):
+        prefix_tokens = [self.get_command("[gMASK]"), self.get_command("sop")]
+        return prefix_tokens
+    def build_single_message(self, role, metadata, message):
+        assert role in ["system", "user", "assistant", "observation"], role
+        role_tokens = [self.get_command(f"<|{role}|>")] + self.tokenizer.encode(f"{metadata}\n")
+        message_tokens = self.tokenizer.encode(message)
+        tokens = role_tokens + message_tokens
+        return tokens
+    def build_chat_input(self, query, history=None, role="user"):
+        if history is None:
+            history = []
+        input_ids = []
+        for item in history:
+            content = item["content"]
+            if item["role"] == "system" and "tools" in item:
+                content = content + "\n" + json.dumps(item["tools"], indent=4, ensure_ascii=False)
+            input_ids.extend(self.build_single_message(item["role"], item.get("metadata", ""), content))
+        input_ids.extend(self.build_single_message(role, "", query))
+        input_ids.extend([self.get_command("<|assistant|>")])
+        return self.batch_encode_plus([input_ids], return_tensors="pt", is_split_into_words=True)
+    def build_inputs_with_special_tokens(
+            self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
+    ) -> List[int]:
+        """
+        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
+        adding special tokens. A BERT sequence has the following format:
+        - single sequence: `[CLS] X [SEP]`
+        - pair of sequences: `[CLS] A [SEP] B [SEP]`
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+        Returns:
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
+        """
+        prefix_tokens = self.get_prefix_tokens()
+        token_ids_0 = prefix_tokens + token_ids_0
+        if token_ids_1 is not None:
+            token_ids_0 = token_ids_0 + token_ids_1 + [self.get_command("<eos>")]
+        return token_ids_0
+    def _pad(
+            self,
+            encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
+            max_length: Optional[int] = None,
+            padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
+            pad_to_multiple_of: Optional[int] = None,
+            return_attention_mask: Optional[bool] = None,
+    ) -> dict:
+        """
+        Pad encoded inputs (on left/right and up to predefined length or max length in the batch)
+        Args:
+            encoded_inputs:
+                Dictionary of tokenized inputs (`List[int]`) or batch of tokenized inputs (`List[List[int]]`).
+            max_length: maximum length of the returned list and optionally padding length (see below).
+                Will truncate by taking into account the special tokens.
+            padding_strategy: PaddingStrategy to use for padding.
+                - PaddingStrategy.LONGEST Pad to the longest sequence in the batch
+                - PaddingStrategy.MAX_LENGTH: Pad to the max length (default)
+                - PaddingStrategy.DO_NOT_PAD: Do not pad
+                The tokenizer padding sides are defined in self.padding_side:
+                    - 'left': pads on the left of the sequences
+                    - 'right': pads on the right of the sequences
+            pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
+                This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
+                `>= 7.5` (Volta).
+            return_attention_mask:
+                (optional) Set to False to avoid returning attention mask (default: set to model specifics)
+        """
+        # Load from model defaults
+        assert self.padding_side == "left"
+        required_input = encoded_inputs[self.model_input_names[0]]
+        seq_length = len(required_input)
+        if padding_strategy == PaddingStrategy.LONGEST:
+            max_length = len(required_input)
+        if max_length is not None and pad_to_multiple_of is not None and (max_length % pad_to_multiple_of != 0):
+            max_length = ((max_length // pad_to_multiple_of) + 1) * pad_to_multiple_of
+        needs_to_be_padded = padding_strategy != PaddingStrategy.DO_NOT_PAD and len(required_input) != max_length
+        # Initialize attention mask if not present.
+        if "attention_mask" not in encoded_inputs:
+            encoded_inputs["attention_mask"] = [1] * seq_length
+        if "position_ids" not in encoded_inputs:
+            encoded_inputs["position_ids"] = list(range(seq_length))
+        if needs_to_be_padded:
+            difference = max_length - len(required_input)
+            if "attention_mask" in encoded_inputs:
+                encoded_inputs["attention_mask"] = [0] * difference + encoded_inputs["attention_mask"]
+            if "position_ids" in encoded_inputs:
+                encoded_inputs["position_ids"] = [0] * difference + encoded_inputs["position_ids"]
+            encoded_inputs[self.model_input_names[0]] = [self.pad_token_id] * difference + required_input
+        return encoded_inputs

checkpoint-1000/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e7dc4c393423b76e4373e5157ddc34803a0189ba96b21ddbb40269d31468a6f2
+size 1018370

checkpoint-1000/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "added_tokens_decoder": {
+    "64795": {
+      "content": "<|user|>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "64797": {
+      "content": "<|observation|>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|user|>",
+    "<|observation|>"
+  ],
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenization_chatglm.ChatGLMTokenizer",
+      null
+    ]
+  },
+  "clean_up_tokenization_spaces": false,
+  "do_lower_case": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "padding_side": "right",
+  "remove_space": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "ChatGLMTokenizer",
+  "tokenizer_file": null
+}

checkpoint-1000/trainer_state.json ADDED Viewed

	@@ -0,0 +1,619 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.6701289998324678,
+  "eval_steps": 500,
+  "global_step": 1000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99993842168232e-05,
+      "loss": 1.2211,
+      "step": 10
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9997536897627915e-05,
+      "loss": 1.0276,
+      "step": 20
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.9994458133418e-05,
+      "loss": 0.8587,
+      "step": 30
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.999014807586154e-05,
+      "loss": 0.7431,
+      "step": 40
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.9984606937283405e-05,
+      "loss": 0.6841,
+      "step": 50
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9977834990654804e-05,
+      "loss": 0.6452,
+      "step": 60
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.99698325695798e-05,
+      "loss": 0.6347,
+      "step": 70
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9960600068278876e-05,
+      "loss": 0.6109,
+      "step": 80
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.995013794156957e-05,
+      "loss": 0.5911,
+      "step": 90
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.993844670484401e-05,
+      "loss": 0.5803,
+      "step": 100
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.992552693404354e-05,
+      "loss": 0.5902,
+      "step": 110
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.991137926563036e-05,
+      "loss": 0.5745,
+      "step": 120
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.9896004396556176e-05,
+      "loss": 0.5538,
+      "step": 130
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.987940308422783e-05,
+      "loss": 0.5495,
+      "step": 140
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.986157614647005e-05,
+      "loss": 0.5433,
+      "step": 150
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.984252446148508e-05,
+      "loss": 0.548,
+      "step": 160
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.98222489678095e-05,
+      "loss": 0.5361,
+      "step": 170
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.980075066426796e-05,
+      "loss": 0.5331,
+      "step": 180
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.977803060992393e-05,
+      "loss": 0.53,
+      "step": 190
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.97540899240276e-05,
+      "loss": 0.5135,
+      "step": 200
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.972892978596069e-05,
+      "loss": 0.5101,
+      "step": 210
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.970255143517838e-05,
+      "loss": 0.5125,
+      "step": 220
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.967495617114826e-05,
+      "loss": 0.4928,
+      "step": 230
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.964614535328626e-05,
+      "loss": 0.4878,
+      "step": 240
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.961612040088973e-05,
+      "loss": 0.5017,
+      "step": 250
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.9584882793067534e-05,
+      "loss": 0.4863,
+      "step": 260
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 4.955243406866713e-05,
+      "loss": 0.4847,
+      "step": 270
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 4.951877582619881e-05,
+      "loss": 0.4868,
+      "step": 280
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 4.948390972375694e-05,
+      "loss": 0.4748,
+      "step": 290
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 4.944783747893825e-05,
+      "loss": 0.4764,
+      "step": 300
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 4.941056086875727e-05,
+      "loss": 0.4712,
+      "step": 310
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 4.937208172955876e-05,
+      "loss": 0.4642,
+      "step": 320
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 4.9332401956927224e-05,
+      "loss": 0.4642,
+      "step": 330
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 4.9291523505593604e-05,
+      "loss": 0.4709,
+      "step": 340
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 4.9249448389338905e-05,
+      "loss": 0.461,
+      "step": 350
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 4.920617868089501e-05,
+      "loss": 0.4677,
+      "step": 360
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 4.9161716511842614e-05,
+      "loss": 0.4564,
+      "step": 370
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 4.911606407250617e-05,
+      "loss": 0.4663,
+      "step": 380
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 4.9069223611846014e-05,
+      "loss": 0.4682,
+      "step": 390
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 4.9021197437347555e-05,
+      "loss": 0.4636,
+      "step": 400
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 4.897198791490762e-05,
+      "loss": 0.4569,
+      "step": 410
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 4.8921597468717887e-05,
+      "loss": 0.462,
+      "step": 420
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 4.887002858114548e-05,
+      "loss": 0.4563,
+      "step": 430
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 4.881728379261068e-05,
+      "loss": 0.4563,
+      "step": 440
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 4.876336570146175e-05,
+      "loss": 0.4468,
+      "step": 450
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 4.870827696384698e-05,
+      "loss": 0.4508,
+      "step": 460
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 4.865202029358379e-05,
+      "loss": 0.4507,
+      "step": 470
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 4.859459846202507e-05,
+      "loss": 0.4486,
+      "step": 480
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 4.853601429792265e-05,
+      "loss": 0.4423,
+      "step": 490
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 4.847627068728795e-05,
+      "loss": 0.4369,
+      "step": 500
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 4.841537057324979e-05,
+      "loss": 0.4429,
+      "step": 510
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 4.835331695590943e-05,
+      "loss": 0.4389,
+      "step": 520
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 4.829011289219276e-05,
+      "loss": 0.44,
+      "step": 530
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 4.82257614956997e-05,
+      "loss": 0.4476,
+      "step": 540
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 4.816026593655085e-05,
+      "loss": 0.4367,
+      "step": 550
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 4.809362944123129e-05,
+      "loss": 0.4357,
+      "step": 560
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 4.802585529243164e-05,
+      "loss": 0.4492,
+      "step": 570
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 4.795694682888635e-05,
+      "loss": 0.4403,
+      "step": 580
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 4.7886907445209234e-05,
+      "loss": 0.4406,
+      "step": 590
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 4.781574059172621e-05,
+      "loss": 0.4317,
+      "step": 600
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 4.7743449774305386e-05,
+      "loss": 0.4379,
+      "step": 610
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 4.7670038554184296e-05,
+      "loss": 0.4324,
+      "step": 620
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 4.7595510547794465e-05,
+      "loss": 0.4329,
+      "step": 630
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 4.751986942658332e-05,
+      "loss": 0.4259,
+      "step": 640
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 4.744311891683325e-05,
+      "loss": 0.4256,
+      "step": 650
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 4.736526279947807e-05,
+      "loss": 0.4289,
+      "step": 660
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 4.728630490991676e-05,
+      "loss": 0.4353,
+      "step": 670
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 4.7206249137824535e-05,
+      "loss": 0.4413,
+      "step": 680
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 4.7125099426961185e-05,
+      "loss": 0.4302,
+      "step": 690
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 4.704285977497687e-05,
+      "loss": 0.4365,
+      "step": 700
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 4.6959534233215116e-05,
+      "loss": 0.4238,
+      "step": 710
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 4.687512690651328e-05,
+      "loss": 0.4284,
+      "step": 720
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 4.678964195300028e-05,
+      "loss": 0.4193,
+      "step": 730
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 4.670308358389184e-05,
+      "loss": 0.4256,
+      "step": 740
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 4.6615456063282944e-05,
+      "loss": 0.4288,
+      "step": 750
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 4.652676370793784e-05,
+      "loss": 0.4335,
+      "step": 760
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.643701088707736e-05,
+      "loss": 0.4271,
+      "step": 770
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.634620202216366e-05,
+      "loss": 0.4304,
+      "step": 780
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.625434158668246e-05,
+      "loss": 0.4249,
+      "step": 790
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.6161434105922616e-05,
+      "loss": 0.4322,
+      "step": 800
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.6067484156753234e-05,
+      "loss": 0.4229,
+      "step": 810
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.597249636739815e-05,
+      "loss": 0.4252,
+      "step": 820
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.5876475417207974e-05,
+      "loss": 0.413,
+      "step": 830
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.577942603642959e-05,
+      "loss": 0.4186,
+      "step": 840
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.568135300597306e-05,
+      "loss": 0.4233,
+      "step": 850
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 4.5582261157176164e-05,
+      "loss": 0.4177,
+      "step": 860
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 4.5482155371566384e-05,
+      "loss": 0.4236,
+      "step": 870
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 4.538104058062042e-05,
+      "loss": 0.4228,
+      "step": 880
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 4.5278921765521234e-05,
+      "loss": 0.4181,
+      "step": 890
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 4.51758039569127e-05,
+      "loss": 0.4261,
+      "step": 900
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 4.5071692234651764e-05,
+      "loss": 0.4217,
+      "step": 910
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 4.4966591727558184e-05,
+      "loss": 0.4191,
+      "step": 920
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 4.48605076131619e-05,
+      "loss": 0.4247,
+      "step": 930
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 4.475344511744794e-05,
+      "loss": 0.4236,
+      "step": 940
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 4.464540951459902e-05,
+      "loss": 0.4172,
+      "step": 950
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 4.4536406126735664e-05,
+      "loss": 0.4209,
+      "step": 960
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 4.442644032365407e-05,
+      "loss": 0.4179,
+      "step": 970
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 4.431551752256155e-05,
+      "loss": 0.4166,
+      "step": 980
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 4.420364318780973e-05,
+      "loss": 0.4173,
+      "step": 990
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 4.4090822830625236e-05,
+      "loss": 0.4166,
+      "step": 1000
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 4476,
+  "num_train_epochs": 3,
+  "save_steps": 200,
+  "total_flos": 9.097203781942116e+18,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1000/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:729bf37ae27da0051469b6c2d9a7528c72ecfe49e138964d0506deffbecbf5dd
+size 4283

checkpoint-1200/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+library_name: peft
+base_model: /home/hz/projects/chatglm3-6b-32k
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Data Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+## Training procedure
+### Framework versions
+- PEFT 0.6.1

checkpoint-1200/adapter_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/hz/projects/chatglm3-6b-32k",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 32.0,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value"
+  ],
+  "task_type": "CAUSAL_LM"
+}

checkpoint-1200/adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f20c941e90bb5428a8c782b9fdd552a0a14752555acb90450f67d506fb61213e
+size 7820185

checkpoint-1200/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|observation|>": 64797,
+  "<|user|>": 64795
+}

checkpoint-1200/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:026f20db4aa943cf7fa1f4842646d76165d84726ac3b103486469d146cba2af2
+size 15644485

checkpoint-1200/rng_state_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:df9c86207da192eb45fafab4e339545a4212fcdb73911f03f77b2c74e2826efe
+size 21687

checkpoint-1200/rng_state_1.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f6ca0cecc54a94203b5893c79a4e9964e83f47d7fc3230251eb9aaaf8fdb015
+size 21687

checkpoint-1200/rng_state_2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cb9205267352ed3614f0586889fed442a66c3c59f8e7824c1af3cadb71f1fa3a
+size 21687

checkpoint-1200/rng_state_3.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2b8666e3e6876bb78a90d36299e773a4dba514962e67e8bf9d2a2acbbe9c5373
+size 21687

checkpoint-1200/rng_state_4.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d11de617b1d36bd8cf3a69ee99c9c9a4f30182d38f1f8a9b66e4482e42ec8e0a
+size 21687

checkpoint-1200/rng_state_5.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38a74118af5ea5a095571d00a792f8c938a69de0335db83aca2804fb6390924e
+size 21687

checkpoint-1200/rng_state_6.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72aa5252e4dd8473b5383b8c74bcf72acf8ea68bf6e54d1c263d57dec2fc1dd6
+size 21687

checkpoint-1200/rng_state_7.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fdb22b5e950a14c4a77bfad244d30f4589fd1944c2820e5f560936cc1640af0c
+size 21687

checkpoint-1200/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3ae922322abc72af5a5a4e1e96d2b6312b8af582e691b3aadd460ab4b8f1cab
+size 627

checkpoint-1200/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "additional_special_tokens": [
+    "<|user|>",
+    "<|observation|>"
+  ]
+}

checkpoint-1200/tokenization_chatglm.py ADDED Viewed

	@@ -0,0 +1,283 @@

+import json
+import os
+import torch
+from typing import List, Optional, Union, Dict
+from sentencepiece import SentencePieceProcessor
+from transformers import PreTrainedTokenizer
+from transformers.utils import logging, PaddingStrategy
+from transformers.tokenization_utils_base import EncodedInput, BatchEncoding
+class SPTokenizer:
+    def __init__(self, model_path: str):
+        # reload tokenizer
+        assert os.path.isfile(model_path), model_path
+        self.sp_model = SentencePieceProcessor(model_file=model_path)
+        # BOS / EOS token IDs
+        self.n_words: int = self.sp_model.vocab_size()
+        self.bos_id: int = self.sp_model.bos_id()
+        self.eos_id: int = self.sp_model.eos_id()
+        self.pad_id: int = self.sp_model.unk_id()
+        assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()
+        special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "sop", "eop", "<|system|>", "<|user|>", "<|assistant|>",
+                          "<|observation|>"]
+        self.special_tokens = {}
+        self.index_special_tokens = {}
+        for token in special_tokens:
+            self.special_tokens[token] = self.n_words
+            self.index_special_tokens[self.n_words] = token
+            self.n_words += 1
+    def tokenize(self, s: str):
+        return self.sp_model.EncodeAsPieces(s)
+    def encode(self, s: str, bos: bool = False, eos: bool = False) -> List[int]:
+        assert type(s) is str
+        t = self.sp_model.encode(s)
+        if bos:
+            t = [self.bos_id] + t
+        if eos:
+            t = t + [self.eos_id]
+        return t
+    def decode(self, t: List[int]) -> str:
+        text, buffer = "", []
+        for token in t:
+            if token in self.index_special_tokens:
+                if buffer:
+                    text += self.sp_model.decode(buffer)
+                    buffer = []
+                text += self.index_special_tokens[token]
+            else:
+                buffer.append(token)
+        if buffer:
+            text += self.sp_model.decode(buffer)
+        return text
+    def decode_tokens(self, tokens: List[str]) -> str:
+        text = self.sp_model.DecodePieces(tokens)
+        return text
+    def convert_token_to_id(self, token):
+        """ Converts a token (str) in an id using the vocab. """
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        return self.sp_model.PieceToId(token)
+    def convert_id_to_token(self, index):
+        """Converts an index (integer) in a token (str) using the vocab."""
+        if index in self.index_special_tokens:
+            return self.index_special_tokens[index]
+        if index in [self.eos_id, self.bos_id, self.pad_id] or index < 0:
+            return ""
+        return self.sp_model.IdToPiece(index)
+class ChatGLMTokenizer(PreTrainedTokenizer):
+    vocab_files_names = {"vocab_file": "tokenizer.model"}
+    model_input_names = ["input_ids", "attention_mask", "position_ids"]
+    def __init__(self, vocab_file, padding_side="left", clean_up_tokenization_spaces=False, **kwargs):
+        self.name = "GLMTokenizer"
+        self.vocab_file = vocab_file
+        self.tokenizer = SPTokenizer(vocab_file)
+        self.special_tokens = {
+            "<bos>": self.tokenizer.bos_id,
+            "<eos>": self.tokenizer.eos_id,
+            "<pad>": self.tokenizer.pad_id
+        }
+        super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces, **kwargs)
+    def get_command(self, token):
+        if token in self.special_tokens:
+            return self.special_tokens[token]
+        assert token in self.tokenizer.special_tokens, f"{token} is not a special token for {self.name}"
+        return self.tokenizer.special_tokens[token]
+    @property
+    def unk_token(self) -> str:
+        return "<unk>"
+    @property
+    def pad_token(self) -> str:
+        return "<unk>"
+    @property
+    def pad_token_id(self):
+        return self.get_command("<pad>")
+    @property
+    def eos_token(self) -> str:
+        return "</s>"
+    @property
+    def eos_token_id(self):
+        return self.get_command("<eos>")
+    @property
+    def vocab_size(self):
+        return self.tokenizer.n_words
+    def get_vocab(self):
+        """ Returns vocab as a dict """
+        vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
+        vocab.update(self.added_tokens_encoder)
+        return vocab
+    def _tokenize(self, text, **kwargs):
+        return self.tokenizer.tokenize(text)
+    def _convert_token_to_id(self, token):
+        """ Converts a token (str) in an id using the vocab. """
+        return self.tokenizer.convert_token_to_id(token)
+    def _convert_id_to_token(self, index):
+        """Converts an index (integer) in a token (str) using the vocab."""
+        return self.tokenizer.convert_id_to_token(index)
+    def convert_tokens_to_string(self, tokens: List[str]) -> str:
+        return self.tokenizer.decode_tokens(tokens)
+    def save_vocabulary(self, save_directory, filename_prefix=None):
+        """
+        Save the vocabulary and special tokens file to a directory.
+        Args:
+            save_directory (`str`):
+                The directory in which to save the vocabulary.
+            filename_prefix (`str`, *optional*):
+                An optional prefix to add to the named of the saved files.
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        if os.path.isdir(save_directory):
+            vocab_file = os.path.join(
+                save_directory, self.vocab_files_names["vocab_file"]
+            )
+        else:
+            vocab_file = save_directory
+        with open(self.vocab_file, 'rb') as fin:
+            proto_str = fin.read()
+        with open(vocab_file, "wb") as writer:
+            writer.write(proto_str)
+        return (vocab_file,)
+    def get_prefix_tokens(self):
+        prefix_tokens = [self.get_command("[gMASK]"), self.get_command("sop")]
+        return prefix_tokens
+    def build_single_message(self, role, metadata, message):
+        assert role in ["system", "user", "assistant", "observation"], role
+        role_tokens = [self.get_command(f"<|{role}|>")] + self.tokenizer.encode(f"{metadata}\n")
+        message_tokens = self.tokenizer.encode(message)
+        tokens = role_tokens + message_tokens
+        return tokens
+    def build_chat_input(self, query, history=None, role="user"):
+        if history is None:
+            history = []
+        input_ids = []
+        for item in history:
+            content = item["content"]
+            if item["role"] == "system" and "tools" in item:
+                content = content + "\n" + json.dumps(item["tools"], indent=4, ensure_ascii=False)
+            input_ids.extend(self.build_single_message(item["role"], item.get("metadata", ""), content))
+        input_ids.extend(self.build_single_message(role, "", query))
+        input_ids.extend([self.get_command("<|assistant|>")])
+        return self.batch_encode_plus([input_ids], return_tensors="pt", is_split_into_words=True)
+    def build_inputs_with_special_tokens(
+            self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
+    ) -> List[int]:
+        """
+        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
+        adding special tokens. A BERT sequence has the following format:
+        - single sequence: `[CLS] X [SEP]`
+        - pair of sequences: `[CLS] A [SEP] B [SEP]`
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs to which the special tokens will be added.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+        Returns:
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
+        """
+        prefix_tokens = self.get_prefix_tokens()
+        token_ids_0 = prefix_tokens + token_ids_0
+        if token_ids_1 is not None:
+            token_ids_0 = token_ids_0 + token_ids_1 + [self.get_command("<eos>")]
+        return token_ids_0
+    def _pad(
+            self,
+            encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
+            max_length: Optional[int] = None,
+            padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
+            pad_to_multiple_of: Optional[int] = None,
+            return_attention_mask: Optional[bool] = None,
+    ) -> dict:
+        """
+        Pad encoded inputs (on left/right and up to predefined length or max length in the batch)
+        Args:
+            encoded_inputs:
+                Dictionary of tokenized inputs (`List[int]`) or batch of tokenized inputs (`List[List[int]]`).
+            max_length: maximum length of the returned list and optionally padding length (see below).
+                Will truncate by taking into account the special tokens.
+            padding_strategy: PaddingStrategy to use for padding.
+                - PaddingStrategy.LONGEST Pad to the longest sequence in the batch
+                - PaddingStrategy.MAX_LENGTH: Pad to the max length (default)
+                - PaddingStrategy.DO_NOT_PAD: Do not pad
+                The tokenizer padding sides are defined in self.padding_side:
+                    - 'left': pads on the left of the sequences
+                    - 'right': pads on the right of the sequences
+            pad_to_multiple_of: (optional) Integer if set will pad the sequence to a multiple of the provided value.
+                This is especially useful to enable the use of Tensor Core on NVIDIA hardware with compute capability
+                `>= 7.5` (Volta).
+            return_attention_mask:
+                (optional) Set to False to avoid returning attention mask (default: set to model specifics)
+        """
+        # Load from model defaults
+        assert self.padding_side == "left"
+        required_input = encoded_inputs[self.model_input_names[0]]
+        seq_length = len(required_input)
+        if padding_strategy == PaddingStrategy.LONGEST:
+            max_length = len(required_input)
+        if max_length is not None and pad_to_multiple_of is not None and (max_length % pad_to_multiple_of != 0):
+            max_length = ((max_length // pad_to_multiple_of) + 1) * pad_to_multiple_of
+        needs_to_be_padded = padding_strategy != PaddingStrategy.DO_NOT_PAD and len(required_input) != max_length
+        # Initialize attention mask if not present.
+        if "attention_mask" not in encoded_inputs:
+            encoded_inputs["attention_mask"] = [1] * seq_length
+        if "position_ids" not in encoded_inputs:
+            encoded_inputs["position_ids"] = list(range(seq_length))
+        if needs_to_be_padded:
+            difference = max_length - len(required_input)
+            if "attention_mask" in encoded_inputs:
+                encoded_inputs["attention_mask"] = [0] * difference + encoded_inputs["attention_mask"]
+            if "position_ids" in encoded_inputs:
+                encoded_inputs["position_ids"] = [0] * difference + encoded_inputs["position_ids"]
+            encoded_inputs[self.model_input_names[0]] = [self.pad_token_id] * difference + required_input
+        return encoded_inputs

checkpoint-1200/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e7dc4c393423b76e4373e5157ddc34803a0189ba96b21ddbb40269d31468a6f2
+size 1018370

checkpoint-1200/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,38 @@

+{
+  "added_tokens_decoder": {
+    "64795": {
+      "content": "<|user|>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    },
+    "64797": {
+      "content": "<|observation|>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|user|>",
+    "<|observation|>"
+  ],
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenization_chatglm.ChatGLMTokenizer",
+      null
+    ]
+  },
+  "clean_up_tokenization_spaces": false,
+  "do_lower_case": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "padding_side": "right",
+  "remove_space": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "ChatGLMTokenizer",
+  "tokenizer_file": null
+}

checkpoint-1200/trainer_state.json ADDED Viewed

	@@ -0,0 +1,739 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.8041547997989613,
+  "eval_steps": 500,
+  "global_step": 1200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.99993842168232e-05,
+      "loss": 1.2211,
+      "step": 10
+    },
+    {
+      "epoch": 0.01,
+      "learning_rate": 4.9997536897627915e-05,
+      "loss": 1.0276,
+      "step": 20
+    },
+    {
+      "epoch": 0.02,
+      "learning_rate": 4.9994458133418e-05,
+      "loss": 0.8587,
+      "step": 30
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.999014807586154e-05,
+      "loss": 0.7431,
+      "step": 40
+    },
+    {
+      "epoch": 0.03,
+      "learning_rate": 4.9984606937283405e-05,
+      "loss": 0.6841,
+      "step": 50
+    },
+    {
+      "epoch": 0.04,
+      "learning_rate": 4.9977834990654804e-05,
+      "loss": 0.6452,
+      "step": 60
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.99698325695798e-05,
+      "loss": 0.6347,
+      "step": 70
+    },
+    {
+      "epoch": 0.05,
+      "learning_rate": 4.9960600068278876e-05,
+      "loss": 0.6109,
+      "step": 80
+    },
+    {
+      "epoch": 0.06,
+      "learning_rate": 4.995013794156957e-05,
+      "loss": 0.5911,
+      "step": 90
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.993844670484401e-05,
+      "loss": 0.5803,
+      "step": 100
+    },
+    {
+      "epoch": 0.07,
+      "learning_rate": 4.992552693404354e-05,
+      "loss": 0.5902,
+      "step": 110
+    },
+    {
+      "epoch": 0.08,
+      "learning_rate": 4.991137926563036e-05,
+      "loss": 0.5745,
+      "step": 120
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.9896004396556176e-05,
+      "loss": 0.5538,
+      "step": 130
+    },
+    {
+      "epoch": 0.09,
+      "learning_rate": 4.987940308422783e-05,
+      "loss": 0.5495,
+      "step": 140
+    },
+    {
+      "epoch": 0.1,
+      "learning_rate": 4.986157614647005e-05,
+      "loss": 0.5433,
+      "step": 150
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.984252446148508e-05,
+      "loss": 0.548,
+      "step": 160
+    },
+    {
+      "epoch": 0.11,
+      "learning_rate": 4.98222489678095e-05,
+      "loss": 0.5361,
+      "step": 170
+    },
+    {
+      "epoch": 0.12,
+      "learning_rate": 4.980075066426796e-05,
+      "loss": 0.5331,
+      "step": 180
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.977803060992393e-05,
+      "loss": 0.53,
+      "step": 190
+    },
+    {
+      "epoch": 0.13,
+      "learning_rate": 4.97540899240276e-05,
+      "loss": 0.5135,
+      "step": 200
+    },
+    {
+      "epoch": 0.14,
+      "learning_rate": 4.972892978596069e-05,
+      "loss": 0.5101,
+      "step": 210
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.970255143517838e-05,
+      "loss": 0.5125,
+      "step": 220
+    },
+    {
+      "epoch": 0.15,
+      "learning_rate": 4.967495617114826e-05,
+      "loss": 0.4928,
+      "step": 230
+    },
+    {
+      "epoch": 0.16,
+      "learning_rate": 4.964614535328626e-05,
+      "loss": 0.4878,
+      "step": 240
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.961612040088973e-05,
+      "loss": 0.5017,
+      "step": 250
+    },
+    {
+      "epoch": 0.17,
+      "learning_rate": 4.9584882793067534e-05,
+      "loss": 0.4863,
+      "step": 260
+    },
+    {
+      "epoch": 0.18,
+      "learning_rate": 4.955243406866713e-05,
+      "loss": 0.4847,
+      "step": 270
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 4.951877582619881e-05,
+      "loss": 0.4868,
+      "step": 280
+    },
+    {
+      "epoch": 0.19,
+      "learning_rate": 4.948390972375694e-05,
+      "loss": 0.4748,
+      "step": 290
+    },
+    {
+      "epoch": 0.2,
+      "learning_rate": 4.944783747893825e-05,
+      "loss": 0.4764,
+      "step": 300
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 4.941056086875727e-05,
+      "loss": 0.4712,
+      "step": 310
+    },
+    {
+      "epoch": 0.21,
+      "learning_rate": 4.937208172955876e-05,
+      "loss": 0.4642,
+      "step": 320
+    },
+    {
+      "epoch": 0.22,
+      "learning_rate": 4.9332401956927224e-05,
+      "loss": 0.4642,
+      "step": 330
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 4.9291523505593604e-05,
+      "loss": 0.4709,
+      "step": 340
+    },
+    {
+      "epoch": 0.23,
+      "learning_rate": 4.9249448389338905e-05,
+      "loss": 0.461,
+      "step": 350
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 4.920617868089501e-05,
+      "loss": 0.4677,
+      "step": 360
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 4.9161716511842614e-05,
+      "loss": 0.4564,
+      "step": 370
+    },
+    {
+      "epoch": 0.25,
+      "learning_rate": 4.911606407250617e-05,
+      "loss": 0.4663,
+      "step": 380
+    },
+    {
+      "epoch": 0.26,
+      "learning_rate": 4.9069223611846014e-05,
+      "loss": 0.4682,
+      "step": 390
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 4.9021197437347555e-05,
+      "loss": 0.4636,
+      "step": 400
+    },
+    {
+      "epoch": 0.27,
+      "learning_rate": 4.897198791490762e-05,
+      "loss": 0.4569,
+      "step": 410
+    },
+    {
+      "epoch": 0.28,
+      "learning_rate": 4.8921597468717887e-05,
+      "loss": 0.462,
+      "step": 420
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 4.887002858114548e-05,
+      "loss": 0.4563,
+      "step": 430
+    },
+    {
+      "epoch": 0.29,
+      "learning_rate": 4.881728379261068e-05,
+      "loss": 0.4563,
+      "step": 440
+    },
+    {
+      "epoch": 0.3,
+      "learning_rate": 4.876336570146175e-05,
+      "loss": 0.4468,
+      "step": 450
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 4.870827696384698e-05,
+      "loss": 0.4508,
+      "step": 460
+    },
+    {
+      "epoch": 0.31,
+      "learning_rate": 4.865202029358379e-05,
+      "loss": 0.4507,
+      "step": 470
+    },
+    {
+      "epoch": 0.32,
+      "learning_rate": 4.859459846202507e-05,
+      "loss": 0.4486,
+      "step": 480
+    },
+    {
+      "epoch": 0.33,
+      "learning_rate": 4.853601429792265e-05,
+      "loss": 0.4423,
+      "step": 490
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 4.847627068728795e-05,
+      "loss": 0.4369,
+      "step": 500
+    },
+    {
+      "epoch": 0.34,
+      "learning_rate": 4.841537057324979e-05,
+      "loss": 0.4429,
+      "step": 510
+    },
+    {
+      "epoch": 0.35,
+      "learning_rate": 4.835331695590943e-05,
+      "loss": 0.4389,
+      "step": 520
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 4.829011289219276e-05,
+      "loss": 0.44,
+      "step": 530
+    },
+    {
+      "epoch": 0.36,
+      "learning_rate": 4.82257614956997e-05,
+      "loss": 0.4476,
+      "step": 540
+    },
+    {
+      "epoch": 0.37,
+      "learning_rate": 4.816026593655085e-05,
+      "loss": 0.4367,
+      "step": 550
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 4.809362944123129e-05,
+      "loss": 0.4357,
+      "step": 560
+    },
+    {
+      "epoch": 0.38,
+      "learning_rate": 4.802585529243164e-05,
+      "loss": 0.4492,
+      "step": 570
+    },
+    {
+      "epoch": 0.39,
+      "learning_rate": 4.795694682888635e-05,
+      "loss": 0.4403,
+      "step": 580
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 4.7886907445209234e-05,
+      "loss": 0.4406,
+      "step": 590
+    },
+    {
+      "epoch": 0.4,
+      "learning_rate": 4.781574059172621e-05,
+      "loss": 0.4317,
+      "step": 600
+    },
+    {
+      "epoch": 0.41,
+      "learning_rate": 4.7743449774305386e-05,
+      "loss": 0.4379,
+      "step": 610
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 4.7670038554184296e-05,
+      "loss": 0.4324,
+      "step": 620
+    },
+    {
+      "epoch": 0.42,
+      "learning_rate": 4.7595510547794465e-05,
+      "loss": 0.4329,
+      "step": 630
+    },
+    {
+      "epoch": 0.43,
+      "learning_rate": 4.751986942658332e-05,
+      "loss": 0.4259,
+      "step": 640
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 4.744311891683325e-05,
+      "loss": 0.4256,
+      "step": 650
+    },
+    {
+      "epoch": 0.44,
+      "learning_rate": 4.736526279947807e-05,
+      "loss": 0.4289,
+      "step": 660
+    },
+    {
+      "epoch": 0.45,
+      "learning_rate": 4.728630490991676e-05,
+      "loss": 0.4353,
+      "step": 670
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 4.7206249137824535e-05,
+      "loss": 0.4413,
+      "step": 680
+    },
+    {
+      "epoch": 0.46,
+      "learning_rate": 4.7125099426961185e-05,
+      "loss": 0.4302,
+      "step": 690
+    },
+    {
+      "epoch": 0.47,
+      "learning_rate": 4.704285977497687e-05,
+      "loss": 0.4365,
+      "step": 700
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 4.6959534233215116e-05,
+      "loss": 0.4238,
+      "step": 710
+    },
+    {
+      "epoch": 0.48,
+      "learning_rate": 4.687512690651328e-05,
+      "loss": 0.4284,
+      "step": 720
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 4.678964195300028e-05,
+      "loss": 0.4193,
+      "step": 730
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 4.670308358389184e-05,
+      "loss": 0.4256,
+      "step": 740
+    },
+    {
+      "epoch": 0.5,
+      "learning_rate": 4.6615456063282944e-05,
+      "loss": 0.4288,
+      "step": 750
+    },
+    {
+      "epoch": 0.51,
+      "learning_rate": 4.652676370793784e-05,
+      "loss": 0.4335,
+      "step": 760
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.643701088707736e-05,
+      "loss": 0.4271,
+      "step": 770
+    },
+    {
+      "epoch": 0.52,
+      "learning_rate": 4.634620202216366e-05,
+      "loss": 0.4304,
+      "step": 780
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 4.625434158668246e-05,
+      "loss": 0.4249,
+      "step": 790
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.6161434105922616e-05,
+      "loss": 0.4322,
+      "step": 800
+    },
+    {
+      "epoch": 0.54,
+      "learning_rate": 4.6067484156753234e-05,
+      "loss": 0.4229,
+      "step": 810
+    },
+    {
+      "epoch": 0.55,
+      "learning_rate": 4.597249636739815e-05,
+      "loss": 0.4252,
+      "step": 820
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.5876475417207974e-05,
+      "loss": 0.413,
+      "step": 830
+    },
+    {
+      "epoch": 0.56,
+      "learning_rate": 4.577942603642959e-05,
+      "loss": 0.4186,
+      "step": 840
+    },
+    {
+      "epoch": 0.57,
+      "learning_rate": 4.568135300597306e-05,
+      "loss": 0.4233,
+      "step": 850
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 4.5582261157176164e-05,
+      "loss": 0.4177,
+      "step": 860
+    },
+    {
+      "epoch": 0.58,
+      "learning_rate": 4.5482155371566384e-05,
+      "loss": 0.4236,
+      "step": 870
+    },
+    {
+      "epoch": 0.59,
+      "learning_rate": 4.538104058062042e-05,
+      "loss": 0.4228,
+      "step": 880
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 4.5278921765521234e-05,
+      "loss": 0.4181,
+      "step": 890
+    },
+    {
+      "epoch": 0.6,
+      "learning_rate": 4.51758039569127e-05,
+      "loss": 0.4261,
+      "step": 900
+    },
+    {
+      "epoch": 0.61,
+      "learning_rate": 4.5071692234651764e-05,
+      "loss": 0.4217,
+      "step": 910
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 4.4966591727558184e-05,
+      "loss": 0.4191,
+      "step": 920
+    },
+    {
+      "epoch": 0.62,
+      "learning_rate": 4.48605076131619e-05,
+      "loss": 0.4247,
+      "step": 930
+    },
+    {
+      "epoch": 0.63,
+      "learning_rate": 4.475344511744794e-05,
+      "loss": 0.4236,
+      "step": 940
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 4.464540951459902e-05,
+      "loss": 0.4172,
+      "step": 950
+    },
+    {
+      "epoch": 0.64,
+      "learning_rate": 4.4536406126735664e-05,
+      "loss": 0.4209,
+      "step": 960
+    },
+    {
+      "epoch": 0.65,
+      "learning_rate": 4.442644032365407e-05,
+      "loss": 0.4179,
+      "step": 970
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 4.431551752256155e-05,
+      "loss": 0.4166,
+      "step": 980
+    },
+    {
+      "epoch": 0.66,
+      "learning_rate": 4.420364318780973e-05,
+      "loss": 0.4173,
+      "step": 990
+    },
+    {
+      "epoch": 0.67,
+      "learning_rate": 4.4090822830625236e-05,
+      "loss": 0.4166,
+      "step": 1000
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 4.3977062008838307e-05,
+      "loss": 0.4173,
+      "step": 1010
+    },
+    {
+      "epoch": 0.68,
+      "learning_rate": 4.3862366326608975e-05,
+      "loss": 0.4049,
+      "step": 1020
+    },
+    {
+      "epoch": 0.69,
+      "learning_rate": 4.374674143415096e-05,
+      "loss": 0.4143,
+      "step": 1030
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 4.363019302745334e-05,
+      "loss": 0.4219,
+      "step": 1040
+    },
+    {
+      "epoch": 0.7,
+      "learning_rate": 4.3512726847999987e-05,
+      "loss": 0.4152,
+      "step": 1050
+    },
+    {
+      "epoch": 0.71,
+      "learning_rate": 4.339434868248665e-05,
+      "loss": 0.4153,
+      "step": 1060
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 4.3275064362535966e-05,
+      "loss": 0.4148,
+      "step": 1070
+    },
+    {
+      "epoch": 0.72,
+      "learning_rate": 4.315487976441014e-05,
+      "loss": 0.4147,
+      "step": 1080
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 4.303380080872145e-05,
+      "loss": 0.41,
+      "step": 1090
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 4.291183346014063e-05,
+      "loss": 0.4119,
+      "step": 1100
+    },
+    {
+      "epoch": 0.74,
+      "learning_rate": 4.278898372710296e-05,
+      "loss": 0.4173,
+      "step": 1110
+    },
+    {
+      "epoch": 0.75,
+      "learning_rate": 4.266525766151238e-05,
+      "loss": 0.4119,
+      "step": 1120
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 4.254066135844326e-05,
+      "loss": 0.4163,
+      "step": 1130
+    },
+    {
+      "epoch": 0.76,
+      "learning_rate": 4.2415200955840184e-05,
+      "loss": 0.4104,
+      "step": 1140
+    },
+    {
+      "epoch": 0.77,
+      "learning_rate": 4.228888263421557e-05,
+      "loss": 0.4045,
+      "step": 1150
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 4.216171261634521e-05,
+      "loss": 0.413,
+      "step": 1160
+    },
+    {
+      "epoch": 0.78,
+      "learning_rate": 4.2033697166961716e-05,
+      "loss": 0.4112,
+      "step": 1170
+    },
+    {
+      "epoch": 0.79,
+      "learning_rate": 4.1904842592445906e-05,
+      "loss": 0.4018,
+      "step": 1180
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 4.177515524051609e-05,
+      "loss": 0.4068,
+      "step": 1190
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 4.1644641499915454e-05,
+      "loss": 0.4029,
+      "step": 1200
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 4476,
+  "num_train_epochs": 3,
+  "save_steps": 200,
+  "total_flos": 1.0916468228205052e+19,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1200/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:729bf37ae27da0051469b6c2d9a7528c72ecfe49e138964d0506deffbecbf5dd
+size 4283

checkpoint-1400/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+library_name: peft
+base_model: /home/hz/projects/chatglm3-6b-32k
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Data Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+## Training procedure
+### Framework versions
+- PEFT 0.6.1

checkpoint-1400/adapter_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/hz/projects/chatglm3-6b-32k",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 32.0,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value"
+  ],
+  "task_type": "CAUSAL_LM"
+}

checkpoint-1400/adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2adba2088f9c10df3366820a3640d294362738b7833f646fb0aada9e022509b4
+size 7820185

checkpoint-1400/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|observation|>": 64797,
+  "<|user|>": 64795
+}