Upload NLIScorer

Browse files

Files changed (7) hide show

README.md +199 -0
config.json +64 -0
model.safetensors +3 -0
pipeline.py +388 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +947 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "_name_or_path": "param-bharat/ModernBERT-base-nli-clf",
+  "architectures": [
+    "ModernBertForSequenceClassification"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 50281,
+  "classifier_activation": "gelu",
+  "classifier_bias": true,
+  "classifier_dropout": 0.3,
+  "classifier_pooling": "cls",
+  "cls_token_id": 50281,
+  "custom_pipelines": {
+    "nli-scorer": {
+      "impl": "pipeline.NLIScorer",
+      "pt": [
+        "AutoModelForSequenceClassification"
+      ],
+      "tf": []
+    }
+  },
+  "decoder_bias": true,
+  "deterministic_flash_attn": false,
+  "embedding_dropout": 0.0,
+  "eos_token_id": 50282,
+  "global_attn_every_n_layers": 3,
+  "global_rope_theta": 160000.0,
+  "gradient_checkpointing": false,
+  "hidden_activation": "gelu",
+  "hidden_size": 768,
+  "id2label": {
+    "0": "False",
+    "1": "True"
+  },
+  "initializer_cutoff_factor": 2.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 1152,
+  "label2id": {
+    "False": 0,
+    "True": 1
+  },
+  "layer_norm_eps": 1e-05,
+  "local_attention": 128,
+  "local_rope_theta": 10000.0,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "mlp_dropout": 0.0,
+  "model_type": "modernbert",
+  "norm_bias": false,
+  "norm_eps": 1e-05,
+  "num_attention_heads": 12,
+  "num_hidden_layers": 22,
+  "pad_token_id": 50283,
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "reference_compile": true,
+  "sep_token_id": 50282,
+  "sparse_pred_ignore_index": -100,
+  "sparse_prediction": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.48.0.dev0",
+  "vocab_size": 50368
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66b2bc760eeba85d11fd61534cccedfaac37d5d7dc3f421df4d11642163e04e1
+size 598442928

pipeline.py ADDED Viewed

	@@ -0,0 +1,388 @@

+from pydantic import BaseModel, ConfigDict
+from transformers import (
+    AutoTokenizer,
+    PreTrainedTokenizerFast,
+    PreTrainedTokenizer,
+    BatchEncoding,
+)
+from transformers import Pipeline
+class NLIInstruction(BaseModel):
+    tokenizer: AutoTokenizer | PreTrainedTokenizerFast | PreTrainedTokenizer
+    instruction: str
+    hypothesis: str
+    Prompt: str | None = None
+    Completion: str | None = None
+    Context: str | None = None
+    ChatHistory: list[dict[str, str]] | None = None
+    model_config = ConfigDict(arbitrary_types_allowed=True)
+    def format_chat_history(self, chat_history: list[dict[str, str]]) -> str:
+        return "\n".join(
+            [
+                f"### Background\n{message['role']}: {message['content']}"
+                for message in chat_history
+            ]
+        )
+    @property
+    def premise(self) -> str:
+        base_template = "## Premise\n"
+        if self.Context:
+            base_template += f"### Context\n{self.Context}\n"
+        if self.ChatHistory:
+            base_template += self.format_chat_history(self.ChatHistory)
+        if self.Prompt:
+            base_template += f"### Prompt\n{self.Prompt}\n"
+        if self.Completion:
+            base_template += f"### Completion\n{self.Completion}\n"
+        return base_template
+    @property
+    def as_str(self):
+        return f"{self.instruction}\n{self.premise}\n{self.hypothesis}"
+    @property
+    def as_model_inputs(self) -> dict[str, list[int]]:
+        instruction_ids = self.tokenizer(
+            self.instruction, add_special_tokens=False
+        ).input_ids
+        premise_ids = self.tokenizer(self.premise, add_special_tokens=False).input_ids
+        hypothesis_ids = self.tokenizer(
+            self.hypothesis, add_special_tokens=False
+        ).input_ids
+        premise_length = self.tokenizer.model_max_length - len(
+            instruction_ids + hypothesis_ids
+        )
+        premise_ids = premise_ids[:premise_length]
+        input_ids = (
+            [self.tokenizer.cls_token_id]
+            + instruction_ids
+            + [self.tokenizer.sep_token_id]
+            + premise_ids
+            + [self.tokenizer.sep_token_id]
+            + hypothesis_ids
+            + [self.tokenizer.sep_token_id]
+        )
+        attention_mask = [1] * len(input_ids)
+        return BatchEncoding(
+            data={"input_ids": input_ids, "attention_mask": attention_mask},
+            tensor_type="pt",
+            prepend_batch_axis=True,
+        )
+class DetectionResponseFactualConsistency(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion is factually consistent with the context."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion is factually consistent with the context."""
+    )
+class DetectionContextPromptInjection(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the prompt contains a prompt injection attack."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe prompt contains a prompt injection attack."""
+    )
+class DetectionContextSourceCode(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the given context contains any source code or code snippets."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe given context contains source code or code blocks."""
+    )
+class DetectionContextToxicity(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any toxic content related to one of the following: disability, gender, identity hate, insult, obscene, race, religion, toxicity, violence."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context contains some toxic content related to one of the following: disability, gender, identity hate, insult, obscene, race, religion, toxicity, violence."""
+    )
+class DetectionContextToxicityDisability(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any toxic content related to disability."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context contains some toxic content related to disability."""
+    )
+class DetectionContextToxicityGender(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any toxic content related to gender."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context contains some toxic content related to gender."""
+    )
+class DetectionContextToxicityIdentityHate(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any toxic content related to identity hate."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context contains some toxic content related to identity hate."""
+    )
+class DetectionContextToxicityInsult(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any insulting content."""
+    )
+    hypothesis: str = """## Hypothesis\nThe context contains some insulting content."""
+class DetectionContextToxicityObscene(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any obscene content."""
+    )
+    hypothesis: str = """## Hypothesis\nThe context contains some obscene content."""
+class DetectionContextToxicityRace(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any racist content."""
+    )
+    hypothesis: str = """## Hypothesis\nThe context contains some racist content."""
+class DetectionContextToxicityReligion(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any toxic content related to religion."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context contains some toxic content related to religion."""
+    )
+class DetectionContextToxicityViolence(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains any violent content."""
+    )
+    hypothesis: str = """## Hypothesis\nThe context contains some violent content."""
+class QualityContextDocumentRelevance(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains relevant information used by the completion to answer the question in the given prompt correctly."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context contains relevant information used by the completion to answer the question in the given prompt correctly."""
+    )
+class QualityContextDocumentUtilization(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context was utilized in the completion to answer the question in the given prompt correctly."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context was utilized in the completion to answer the question in the given prompt correctly."""
+    )
+class QualityContextSentenceRelevance(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the context contains relevant information used by the completion to answer the question in the given prompt correctly."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe context contains relevant information used by the completion to answer the question in the given prompt correctly."""
+    )
+    Sentence: str
+    @property
+    def premise(self) -> str:
+        return super().premise + f"\n### Sentence\n{self.Sentence}\n"
+class QualityContextSentenceUtilization(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the selected sentence was utilized in the completion to answer the question in the given prompt correctly."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe selected sentence was utilized in the completion to answer the question in the given prompt correctly."""
+    )
+    Sentence: str
+    @property
+    def premise(self) -> str:
+        return super().premise + f"\n### Sentence\n{self.Sentence}\n"
+class QualityResponseAdherence(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion adheres to the context when answering the question in the given prompt."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion adheres to the context when answering the question in the given prompt."""
+    )
+class QualityResponseAttribution(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion attributes the context when answering the question in the given prompt."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion attributes the context when answering the question in the given prompt."""
+    )
+class QualityResponseCoherence(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion is coherent and for the given context."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion is coherent and for the given context."""
+    )
+class QualityResponseComplexity(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion is complex and contains multiple steps to answer the question."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion is complex and contains multiple steps to answer the question."""
+    )
+class QualityResponseCorrectness(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion is correct with respect to the given prompt and context."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion is correct with respect to the given prompt and context."""
+    )
+class QualityResponseHelpfulness(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion is helpful with respect to the given prompt and context."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion is helpful with respect to the given prompt and context."""
+    )
+class QualityResponseInstructionFollowing(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion follows the instructions provided in the given prompt."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion follows the instructions provided in the given prompt."""
+    )
+class QualityResponseRelevance(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion is relevant to the given prompt and context."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion is relevant to the given prompt and context."""
+    )
+class QualityResponseVerbosity(NLIInstruction):
+    instruction: str = (
+        """## Task\nDetermine if the completion is too verbose with respect to the given prompt and context."""
+    )
+    hypothesis: str = (
+        """## Hypothesis\nThe completion is too verbose with respect to the given prompt and context."""
+    )
+TASK_CLASSES = {
+    "Detection/Hallucination/Factual Consistency": DetectionResponseFactualConsistency,
+    "Detection/Prompt Injection": DetectionContextPromptInjection,
+    "Detection/Source Code": DetectionContextSourceCode,
+    "Detection/Toxicity/Disability": DetectionContextToxicityDisability,
+    "Detection/Toxicity/Gender": DetectionContextToxicityGender,
+    "Detection/Toxicity/Identity Hate": DetectionContextToxicityIdentityHate,
+    "Detection/Toxicity/Insult": DetectionContextToxicityInsult,
+    "Detection/Toxicity/Obscene": DetectionContextToxicityObscene,
+    "Detection/Toxicity/Race": DetectionContextToxicityRace,
+    "Detection/Toxicity/Religion": DetectionContextToxicityReligion,
+    "Detection/Toxicity/Toxicity": DetectionContextToxicity,
+    "Detection/Toxicity/Toxic": DetectionContextToxicity,
+    "Detection/Toxicity/Violence": DetectionContextToxicityViolence,
+    "Quality/Context/Document Relevance": QualityContextDocumentRelevance,
+    "Quality/Context/Document Utilization": QualityContextDocumentUtilization,
+    "Quality/Context/Sentence Relevance": QualityContextSentenceRelevance,
+    "Quality/Context/Sentence Utilization": QualityContextSentenceUtilization,
+    "Quality/Response/Adherence": QualityResponseAdherence,
+    "Quality/Response/Attribution": QualityResponseAttribution,
+    "Quality/Response/Coherence": QualityResponseCoherence,
+    "Quality/Response/Complexity": QualityResponseComplexity,
+    "Quality/Response/Correctness": QualityResponseCorrectness,
+    "Quality/Response/Helpfulness": QualityResponseHelpfulness,
+    "Quality/Response/Instruction Following": QualityResponseInstructionFollowing,
+    "Quality/Response/Relevance": QualityResponseRelevance,
+    "Quality/Response/Verbosity": QualityResponseVerbosity,
+}
+TASK_THRESHOLDS = {
+    "Detection/Hallucination/Factual Consistency": 0.5895,
+    "Detection/Prompt Injection": 0.4147,
+    "Detection/Source Code": 0.4001,
+    "Detection/Toxicity/Disability": 0.5547,
+    "Detection/Toxicity/Gender": 0.4007,
+    "Detection/Toxicity/Identity Hate": 0.5502,
+    "Detection/Toxicity/Insult": 0.4913,
+    "Detection/Toxicity/Obscene": 0.448,
+    "Detection/Toxicity/Race": 0.5983,
+    "Detection/Toxicity/Religion": 0.4594,
+    "Detection/Toxicity/Toxic": 0.5034,
+    "Detection/Toxicity/Violence": 0.4031,
+    "Quality/Context/Document Relevance": 0.5809,
+    "Quality/Context/Document Utilization": 0.4005,
+    "Quality/Context/Sentence Relevance": 0.6003,
+    "Quality/Context/Sentence Utilization": 0.5417,
+    "Quality/Response/Adherence": 0.59,
+    "Quality/Response/Attribution": 0.5304,
+    "Quality/Response/Coherence": 0.6891,
+    "Quality/Response/Complexity": 0.7235,
+    "Quality/Response/Correctness": 0.6535,
+    "Quality/Response/Helpfulness": 0.4445,
+    "Quality/Response/Instruction Following": 0.5323,
+    "Quality/Response/Relevance": 0.4011,
+    "Quality/Response/Verbosity": 0.4243,
+}
+class NLIScorer(Pipeline):
+    def _sanitize_parameters(self, **kwargs):
+        preprocess_kwargs = {}
+        postprocess_kwargs = {}
+        if "task_type" in kwargs:
+            preprocess_kwargs["task_type"] = kwargs["task_type"]
+            postprocess_kwargs["task_type"] = kwargs["task_type"]
+        return preprocess_kwargs, {}, postprocess_kwargs
+    def preprocess(self, inputs, task_type):
+        TaskClass = TASK_CLASSES[task_type]
+        task_class = TaskClass(tokenizer=self.tokenizer, **inputs)
+        return task_class.as_model_inputs
+    def _forward(self, model_inputs):
+        outputs = self.model(**model_inputs)
+        return outputs
+    def postprocess(self, model_outputs, task_type):
+        threshold = TASK_THRESHOLDS[task_type]
+        pos_scores = model_outputs["logits"].softmax(-1)[0][1]
+        best_class = int(pos_scores > threshold)
+        if best_class == 1:
+            score = pos_scores
+        else:
+            score = 1 - pos_scores
+        return {"score": score.item(), "label": best_class}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,947 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "|||IP_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "1": {
+      "content": "<|padding|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50254": {
+      "content": "                        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50255": {
+      "content": "                       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50256": {
+      "content": "                      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50257": {
+      "content": "                     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50258": {
+      "content": "                    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50259": {
+      "content": "                   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50260": {
+      "content": "                  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50261": {
+      "content": "                 ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50262": {
+      "content": "                ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50263": {
+      "content": "               ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50264": {
+      "content": "              ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50265": {
+      "content": "             ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50266": {
+      "content": "            ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50267": {
+      "content": "           ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50268": {
+      "content": "          ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50269": {
+      "content": "         ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50270": {
+      "content": "        ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50271": {
+      "content": "       ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50272": {
+      "content": "      ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50273": {
+      "content": "     ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50274": {
+      "content": "    ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50275": {
+      "content": "   ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50276": {
+      "content": "  ",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50277": {
+      "content": "|||EMAIL_ADDRESS|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50278": {
+      "content": "|||PHONE_NUMBER|||",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50279": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50280": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50281": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50282": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50283": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50284": {
+      "content": "[MASK]",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50285": {
+      "content": "[unused0]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50286": {
+      "content": "[unused1]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50287": {
+      "content": "[unused2]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50288": {
+      "content": "[unused3]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50289": {
+      "content": "[unused4]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50290": {
+      "content": "[unused5]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50291": {
+      "content": "[unused6]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50292": {
+      "content": "[unused7]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50293": {
+      "content": "[unused8]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50294": {
+      "content": "[unused9]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50295": {
+      "content": "[unused10]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50296": {
+      "content": "[unused11]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50297": {
+      "content": "[unused12]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50298": {
+      "content": "[unused13]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50299": {
+      "content": "[unused14]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50300": {
+      "content": "[unused15]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50301": {
+      "content": "[unused16]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50302": {
+      "content": "[unused17]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50303": {
+      "content": "[unused18]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50304": {
+      "content": "[unused19]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50305": {
+      "content": "[unused20]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50306": {
+      "content": "[unused21]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50307": {
+      "content": "[unused22]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50308": {
+      "content": "[unused23]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50309": {
+      "content": "[unused24]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50310": {
+      "content": "[unused25]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50311": {
+      "content": "[unused26]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50312": {
+      "content": "[unused27]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50313": {
+      "content": "[unused28]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50314": {
+      "content": "[unused29]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50315": {
+      "content": "[unused30]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50316": {
+      "content": "[unused31]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50317": {
+      "content": "[unused32]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50318": {
+      "content": "[unused33]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50319": {
+      "content": "[unused34]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50320": {
+      "content": "[unused35]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50321": {
+      "content": "[unused36]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50322": {
+      "content": "[unused37]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50323": {
+      "content": "[unused38]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50324": {
+      "content": "[unused39]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50325": {
+      "content": "[unused40]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50326": {
+      "content": "[unused41]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50327": {
+      "content": "[unused42]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50328": {
+      "content": "[unused43]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50329": {
+      "content": "[unused44]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50330": {
+      "content": "[unused45]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50331": {
+      "content": "[unused46]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50332": {
+      "content": "[unused47]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50333": {
+      "content": "[unused48]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50334": {
+      "content": "[unused49]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50335": {
+      "content": "[unused50]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50336": {
+      "content": "[unused51]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50337": {
+      "content": "[unused52]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50338": {
+      "content": "[unused53]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50339": {
+      "content": "[unused54]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50340": {
+      "content": "[unused55]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50341": {
+      "content": "[unused56]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50342": {
+      "content": "[unused57]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50343": {
+      "content": "[unused58]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50344": {
+      "content": "[unused59]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50345": {
+      "content": "[unused60]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50346": {
+      "content": "[unused61]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50347": {
+      "content": "[unused62]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50348": {
+      "content": "[unused63]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50349": {
+      "content": "[unused64]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50350": {
+      "content": "[unused65]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50351": {
+      "content": "[unused66]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50352": {
+      "content": "[unused67]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50353": {
+      "content": "[unused68]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50354": {
+      "content": "[unused69]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50355": {
+      "content": "[unused70]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50356": {
+      "content": "[unused71]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50357": {
+      "content": "[unused72]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50358": {
+      "content": "[unused73]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50359": {
+      "content": "[unused74]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50360": {
+      "content": "[unused75]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50361": {
+      "content": "[unused76]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50362": {
+      "content": "[unused77]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50363": {
+      "content": "[unused78]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50364": {
+      "content": "[unused79]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50365": {
+      "content": "[unused80]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50366": {
+      "content": "[unused81]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "50367": {
+      "content": "[unused82]",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 2048,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 2048,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "truncation": "longest_first",
+  "unk_token": "[UNK]"
+}