BashitAli commited on
Commit
1b2930c
1 Parent(s): 9f3a24a

Upload 9 files

Browse files
README.md CHANGED
@@ -1,3 +1,187 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ inference: false
7
+ datasets:
8
+ - databricks/databricks-dolly-15k
9
  ---
10
+ # dolly-v2-3b Model Card
11
+ ## Summary
12
+
13
+ Databricks' `dolly-v2-3b`, an instruction-following large language model trained on the Databricks machine learning platform
14
+ that is licensed for commercial use. Based on `pythia-2.8b`, Dolly is trained on ~15k instruction/response fine tuning records
15
+ [`databricks-dolly-15k`](https://github.com/databrickslabs/dolly/tree/master/data) generated
16
+ by Databricks employees in capability domains from the InstructGPT paper, including brainstorming, classification, closed QA, generation,
17
+ information extraction, open QA and summarization. `dolly-v2-3b` is not a state-of-the-art model, but does exhibit surprisingly
18
+ high quality instruction following behavior not characteristic of the foundation model on which it is based.
19
+
20
+ Dolly v2 is also available in these larger models sizes:
21
+
22
+ * [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b), a 12 billion parameter based on `pythia-12b`
23
+ * [dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b), a 6.9 billion parameter based on `pythia-6.9b`
24
+
25
+ Please refer to the [dolly GitHub repo](https://github.com/databrickslabs/dolly#getting-started-with-response-generation) for tips on
26
+ running inference for various GPU configurations.
27
+
28
+ **Owner**: Databricks, Inc.
29
+
30
+ ## Model Overview
31
+ `dolly-v2-3b` is a 2.8 billion parameter causal language model created by [Databricks](https://databricks.com/) that is derived from
32
+ [EleutherAI's](https://www.eleuther.ai/) [Pythia-2.8b](https://huggingface.co/EleutherAI/pythia-2.8b) and fine-tuned
33
+ on a [~15K record instruction corpus](https://github.com/databrickslabs/dolly/tree/master/data) generated by Databricks employees and released under a permissive license (CC-BY-SA)
34
+
35
+ ## Usage
36
+
37
+ To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` and `accelerate` libraries installed.
38
+ In a Databricks notebook you could run:
39
+
40
+ ```python
41
+ %pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2"
42
+ ```
43
+
44
+ The instruction following pipeline can be loaded using the `pipeline` function as shown below. This loads a custom `InstructionTextGenerationPipeline`
45
+ found in the model repo [here](https://huggingface.co/databricks/dolly-v2-3b/blob/main/instruct_pipeline.py), which is why `trust_remote_code=True` is required.
46
+ Including `torch_dtype=torch.bfloat16` is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality.
47
+ It is also fine to remove it if there is sufficient memory.
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import pipeline
52
+
53
+ generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
54
+ ```
55
+
56
+ You can then use the pipeline to answer instructions:
57
+
58
+ ```python
59
+ res = generate_text("Explain to me the difference between nuclear fission and fusion.")
60
+ print(res[0]["generated_text"])
61
+ ```
62
+
63
+ Alternatively, if you prefer to not use `trust_remote_code=True` you can download [instruct_pipeline.py](https://huggingface.co/databricks/dolly-v2-3b/blob/main/instruct_pipeline.py),
64
+ store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:
65
+
66
+ ```python
67
+ import torch
68
+ from instruct_pipeline import InstructionTextGenerationPipeline
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer
70
+
71
+ tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b", padding_side="left")
72
+ model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b", device_map="auto", torch_dtype=torch.bfloat16)
73
+
74
+ generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
75
+ ```
76
+
77
+ ### LangChain Usage
78
+
79
+ To use the pipeline with LangChain, you must set `return_full_text=True`, as LangChain expects the full text to be returned
80
+ and the default for the pipeline is to only return the new text.
81
+
82
+ ```python
83
+ import torch
84
+ from transformers import pipeline
85
+
86
+ generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16,
87
+ trust_remote_code=True, device_map="auto", return_full_text=True)
88
+ ```
89
+
90
+ You can create a prompt that either has only an instruction or has an instruction with context:
91
+
92
+ ```python
93
+ from langchain import PromptTemplate, LLMChain
94
+ from langchain.llms import HuggingFacePipeline
95
+
96
+ # template for an instrution with no input
97
+ prompt = PromptTemplate(
98
+ input_variables=["instruction"],
99
+ template="{instruction}")
100
+
101
+ # template for an instruction with input
102
+ prompt_with_context = PromptTemplate(
103
+ input_variables=["instruction", "context"],
104
+ template="{instruction}\n\nInput:\n{context}")
105
+
106
+ hf_pipeline = HuggingFacePipeline(pipeline=generate_text)
107
+
108
+ llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
109
+ llm_context_chain = LLMChain(llm=hf_pipeline, prompt=prompt_with_context)
110
+ ```
111
+
112
+ Example predicting using a simple instruction:
113
+
114
+ ```python
115
+ print(llm_chain.predict(instruction="Explain to me the difference between nuclear fission and fusion.").lstrip())
116
+ ```
117
+
118
+ Example predicting using an instruction with context:
119
+
120
+ ```python
121
+ context = """George Washington (February 22, 1732[b] - December 14, 1799) was an American military officer, statesman,
122
+ and Founding Father who served as the first president of the United States from 1789 to 1797."""
123
+
124
+ print(llm_context_chain.predict(instruction="When was George Washington president?", context=context).lstrip())
125
+ ```
126
+
127
+
128
+ ## Known Limitations
129
+
130
+ ### Performance Limitations
131
+ **`dolly-v2-3b` is not a state-of-the-art generative language model** and, though quantitative benchmarking is ongoing, is not designed to perform
132
+ competitively with more modern model architectures or models subject to larger pretraining corpuses.
133
+
134
+ The Dolly model family is under active development, and so any list of shortcomings is unlikely to be exhaustive, but we include known limitations and misfires here as a means to document and share our preliminary findings with the community.
135
+ In particular, `dolly-v2-3b` struggles with: syntactically complex prompts, programming problems, mathematical operations, factual errors,
136
+ dates and times, open-ended question answering, hallucination, enumerating lists of specific length, stylistic mimicry, having a sense of humor, etc.
137
+ Moreover, we find that `dolly-v2-3b` does not have some capabilities, such as well-formatted letter writing, present in the original model.
138
+
139
+ ### Dataset Limitations
140
+ Like all language models, `dolly-v2-3b` reflects the content and limitations of its training corpuses.
141
+
142
+ - **The Pile**: GPT-J's pre-training corpus contains content mostly collected from the public internet, and like most web-scale datasets,
143
+ it contains content many users would find objectionable. As such, the model is likely to reflect these shortcomings, potentially overtly
144
+ in the case it is explicitly asked to produce objectionable content, and sometimes subtly, as in the case of biased or harmful implicit
145
+ associations.
146
+
147
+ - **`databricks-dolly-15k`**: The training data on which `dolly-v2-3b` is instruction tuned represents natural language instructions generated
148
+ by Databricks employees during a period spanning March and April 2023 and includes passages from Wikipedia as references passages
149
+ for instruction categories like closed QA and summarization. To our knowledge it does not contain obscenity, intellectual property or
150
+ personally identifying information about non-public figures, but it may contain typos and factual errors.
151
+ The dataset may also reflect biases found in Wikipedia. Finally, the dataset likely reflects
152
+ the interests and semantic choices of Databricks employees, a demographic which is not representative of the global population at large.
153
+
154
+ Databricks is committed to ongoing research and development efforts to develop helpful, honest and harmless AI technologies that
155
+ maximize the potential of all individuals and organizations.
156
+
157
+ ### Benchmark Metrics
158
+
159
+ Below you'll find various models benchmark performance on the [EleutherAI LLM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness);
160
+ model results are sorted by geometric mean to produce an intelligible ordering. As outlined above, these results demonstrate that `dolly-v2-3b` is not state of the art.
161
+ It underperforms `dolly-v1-6b` in the evaluation benchmarks, which is not surprising considering it has half the number of parameters.
162
+
163
+ | model | openbookqa | arc_easy | winogrande | hellaswag | arc_challenge | piqa | boolq | gmean |
164
+ | --------------------------------- | ------------ | ---------- | ------------ | ----------- | --------------- | -------- | -------- | ---------|
165
+ | EleutherAI/pythia-2.8b | 0.348 | 0.585859 | 0.589582 | 0.591217 | 0.323379 | 0.73395 | 0.638226 | 0.523431 |
166
+ | EleutherAI/pythia-6.9b | 0.368 | 0.604798 | 0.608524 | 0.631548 | 0.343857 | 0.761153 | 0.6263 | 0.543567 |
167
+ | databricks/dolly-v2-3b | 0.384 | 0.611532 | 0.589582 | 0.650767 | 0.370307 | 0.742655 | 0.575535 | 0.544886 |
168
+ | EleutherAI/pythia-12b | 0.364 | 0.627104 | 0.636148 | 0.668094 | 0.346416 | 0.760065 | 0.673394 | 0.559676 |
169
+ | EleutherAI/gpt-j-6B | 0.382 | 0.621633 | 0.651144 | 0.662617 | 0.363481 | 0.761153 | 0.655963 | 0.565936 |
170
+ | databricks/dolly-v2-12b | 0.408 | 0.63931 | 0.616417 | 0.707927 | 0.388225 | 0.757889 | 0.568196 | 0.56781 |
171
+ | databricks/dolly-v2-7b | 0.392 | 0.633838 | 0.607735 | 0.686517 | 0.406997 | 0.750816 | 0.644037 | 0.573487 |
172
+ | databricks/dolly-v1-6b | 0.41 | 0.62963 | 0.643252 | 0.676758 | 0.384812 | 0.773667 | 0.687768 | 0.583431 |
173
+ | EleutherAI/gpt-neox-20b | 0.402 | 0.683923 | 0.656669 | 0.7142 | 0.408703 | 0.784004 | 0.695413 | 0.602236 |
174
+
175
+ # Citation
176
+
177
+ ```
178
+ @online{DatabricksBlog2023DollyV2,
179
+ author = {Mike Conover and Matt Hayes and Ankit Mathur and Jianwei Xie and Jun Wan and Sam Shah and Ali Ghodsi and Patrick Wendell and Matei Zaharia and Reynold Xin},
180
+ title = {Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM},
181
+ year = {2023},
182
+ url = {https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm},
183
+ urldate = {2023-06-30}
184
+ }
185
+ ```
186
+
187
+ # Happy Hacking!
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "EleutherAI/pythia-2.8b",
3
+ "architectures": [
4
+ "GPTNeoXForCausalLM"
5
+ ],
6
+ "custom_pipelines": {
7
+ "text-generation": {
8
+ "impl": "instruct_pipeline.InstructionTextGenerationPipeline",
9
+ "pt": "AutoModelForCausalLM",
10
+ "tf": "TFAutoModelForCausalLM"
11
+ }
12
+ },
13
+ "bos_token_id": 0,
14
+ "eos_token_id": 0,
15
+ "hidden_act": "gelu",
16
+ "hidden_size": 2560,
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 10240,
19
+ "layer_norm_eps": 1e-05,
20
+ "max_position_embeddings": 2048,
21
+ "model_type": "gpt_neox",
22
+ "num_attention_heads": 32,
23
+ "num_hidden_layers": 32,
24
+ "rotary_emb_base": 10000,
25
+ "rotary_pct": 0.25,
26
+ "tie_word_embeddings": false,
27
+ "torch_dtype": "bfloat16",
28
+ "transformers_version": "4.25.1",
29
+ "use_cache": true,
30
+ "use_parallel_residual": true,
31
+ "vocab_size": 50280
32
+ }
gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
instruct_pipeline.py ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import re
3
+ from typing import List
4
+
5
+ import numpy as np
6
+ from transformers import Pipeline, PreTrainedTokenizer
7
+
8
+ from transformers.utils import is_tf_available
9
+
10
+ if is_tf_available():
11
+ import tensorflow as tf
12
+
13
+ logger = logging.getLogger(__name__)
14
+
15
+ INSTRUCTION_KEY = "### Instruction:"
16
+ RESPONSE_KEY = "### Response:"
17
+ END_KEY = "### End"
18
+ INTRO_BLURB = (
19
+ "Below is an instruction that describes a task. Write a response that appropriately completes the request."
20
+ )
21
+
22
+ # This is the prompt that is used for generating responses using an already trained model. It ends with the response
23
+ # key, where the job of the model is to provide the completion that follows it (i.e. the response itself).
24
+ PROMPT_FOR_GENERATION_FORMAT = """{intro}
25
+
26
+ {instruction_key}
27
+ {instruction}
28
+
29
+ {response_key}
30
+ """.format(
31
+ intro=INTRO_BLURB,
32
+ instruction_key=INSTRUCTION_KEY,
33
+ instruction="{instruction}",
34
+ response_key=RESPONSE_KEY,
35
+ )
36
+
37
+
38
+ def get_special_token_id(tokenizer: PreTrainedTokenizer, key: str) -> int:
39
+ """Gets the token ID for a given string that has been added to the tokenizer as a special token.
40
+
41
+ When training, we configure the tokenizer so that the sequences like "### Instruction:" and "### End" are
42
+ treated specially and converted to a single, new token. This retrieves the token ID each of these keys map to.
43
+
44
+ Args:
45
+ tokenizer (PreTrainedTokenizer): the tokenizer
46
+ key (str): the key to convert to a single token
47
+
48
+ Raises:
49
+ RuntimeError: if more than one ID was generated
50
+
51
+ Returns:
52
+ int: the token ID for the given key
53
+ """
54
+ token_ids = tokenizer.encode(key)
55
+ if len(token_ids) > 1:
56
+ raise ValueError(f"Expected only a single token for '{key}' but found {token_ids}")
57
+ return token_ids[0]
58
+
59
+
60
+ class InstructionTextGenerationPipeline(Pipeline):
61
+ def __init__(
62
+ self, *args, do_sample: bool = True, max_new_tokens: int = 256, top_p: float = 0.92, top_k: int = 0, **kwargs
63
+ ):
64
+ """Initialize the pipeline
65
+
66
+ Args:
67
+ do_sample (bool, optional): Whether or not to use sampling. Defaults to True.
68
+ max_new_tokens (int, optional): Max new tokens after the prompt to generate. Defaults to 128.
69
+ top_p (float, optional): If set to float < 1, only the smallest set of most probable tokens with
70
+ probabilities that add up to top_p or higher are kept for generation. Defaults to 0.92.
71
+ top_k (int, optional): The number of highest probability vocabulary tokens to keep for top-k-filtering.
72
+ Defaults to 0.
73
+ """
74
+ super().__init__(*args, do_sample=do_sample, max_new_tokens=max_new_tokens, top_p=top_p, top_k=top_k,
75
+ **kwargs)
76
+
77
+ def _sanitize_parameters(self,
78
+ return_full_text: bool = None,
79
+ **generate_kwargs):
80
+ preprocess_params = {}
81
+
82
+ # newer versions of the tokenizer configure the response key as a special token. newer versions still may
83
+ # append a newline to yield a single token. find whatever token is configured for the response key.
84
+ tokenizer_response_key = next(
85
+ (token for token in self.tokenizer.additional_special_tokens if token.startswith(RESPONSE_KEY)), None
86
+ )
87
+
88
+ response_key_token_id = None
89
+ end_key_token_id = None
90
+ if tokenizer_response_key:
91
+ try:
92
+ response_key_token_id = get_special_token_id(self.tokenizer, tokenizer_response_key)
93
+ end_key_token_id = get_special_token_id(self.tokenizer, END_KEY)
94
+
95
+ # Ensure generation stops once it generates "### End"
96
+ generate_kwargs["eos_token_id"] = end_key_token_id
97
+ except ValueError:
98
+ pass
99
+
100
+ forward_params = generate_kwargs
101
+ postprocess_params = {
102
+ "response_key_token_id": response_key_token_id,
103
+ "end_key_token_id": end_key_token_id
104
+ }
105
+
106
+ if return_full_text is not None:
107
+ postprocess_params["return_full_text"] = return_full_text
108
+
109
+ return preprocess_params, forward_params, postprocess_params
110
+
111
+ def preprocess(self, instruction_text, **generate_kwargs):
112
+ prompt_text = PROMPT_FOR_GENERATION_FORMAT.format(instruction=instruction_text)
113
+ inputs = self.tokenizer(
114
+ prompt_text,
115
+ return_tensors="pt",
116
+ )
117
+ inputs["prompt_text"] = prompt_text
118
+ inputs["instruction_text"] = instruction_text
119
+ return inputs
120
+
121
+ def _forward(self, model_inputs, **generate_kwargs):
122
+ input_ids = model_inputs["input_ids"]
123
+ attention_mask = model_inputs.get("attention_mask", None)
124
+
125
+ if input_ids.shape[1] == 0:
126
+ input_ids = None
127
+ attention_mask = None
128
+ in_b = 1
129
+ else:
130
+ in_b = input_ids.shape[0]
131
+
132
+ generated_sequence = self.model.generate(
133
+ input_ids=input_ids.to(self.model.device),
134
+ attention_mask=attention_mask.to(self.model.device) if attention_mask is not None else None,
135
+ pad_token_id=self.tokenizer.pad_token_id,
136
+ **generate_kwargs,
137
+ )
138
+
139
+ out_b = generated_sequence.shape[0]
140
+ if self.framework == "pt":
141
+ generated_sequence = generated_sequence.reshape(in_b, out_b // in_b, *generated_sequence.shape[1:])
142
+ elif self.framework == "tf":
143
+ generated_sequence = tf.reshape(generated_sequence, (in_b, out_b // in_b, *generated_sequence.shape[1:]))
144
+
145
+ instruction_text = model_inputs.pop("instruction_text")
146
+ return {"generated_sequence": generated_sequence, "input_ids": input_ids, "instruction_text": instruction_text}
147
+
148
+ def postprocess(self, model_outputs, response_key_token_id, end_key_token_id, return_full_text: bool = False):
149
+
150
+ generated_sequence = model_outputs["generated_sequence"][0]
151
+ instruction_text = model_outputs["instruction_text"]
152
+
153
+ generated_sequence: List[List[int]] = generated_sequence.numpy().tolist()
154
+ records = []
155
+ for sequence in generated_sequence:
156
+
157
+ # The response will be set to this variable if we can identify it.
158
+ decoded = None
159
+
160
+ # If we have token IDs for the response and end, then we can find the tokens and only decode between them.
161
+ if response_key_token_id and end_key_token_id:
162
+ # Find where "### Response:" is first found in the generated tokens. Considering this is part of the
163
+ # prompt, we should definitely find it. We will return the tokens found after this token.
164
+ try:
165
+ response_pos = sequence.index(response_key_token_id)
166
+ except ValueError:
167
+ logger.warn(f"Could not find response key {response_key_token_id} in: {sequence}")
168
+ response_pos = None
169
+
170
+ if response_pos:
171
+ # Next find where "### End" is located. The model has been trained to end its responses with this
172
+ # sequence (or actually, the token ID it maps to, since it is a special token). We may not find
173
+ # this token, as the response could be truncated. If we don't find it then just return everything
174
+ # to the end. Note that even though we set eos_token_id, we still see the this token at the end.
175
+ try:
176
+ end_pos = sequence.index(end_key_token_id)
177
+ except ValueError:
178
+ end_pos = None
179
+
180
+ decoded = self.tokenizer.decode(sequence[response_pos + 1 : end_pos]).strip()
181
+
182
+ if not decoded:
183
+ # Otherwise we'll decode everything and use a regex to find the response and end.
184
+
185
+ fully_decoded = self.tokenizer.decode(sequence)
186
+
187
+ # The response appears after "### Response:". The model has been trained to append "### End" at the
188
+ # end.
189
+ m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", fully_decoded, flags=re.DOTALL)
190
+
191
+ if m:
192
+ decoded = m.group(1).strip()
193
+ else:
194
+ # The model might not generate the "### End" sequence before reaching the max tokens. In this case,
195
+ # return everything after "### Response:".
196
+ m = re.search(r"#+\s*Response:\s*(.+)", fully_decoded, flags=re.DOTALL)
197
+ if m:
198
+ decoded = m.group(1).strip()
199
+ else:
200
+ logger.warn(f"Failed to find response in:\n{fully_decoded}")
201
+
202
+ # If the full text is requested, then append the decoded text to the original instruction.
203
+ # This technically isn't the full text, as we format the instruction in the prompt the model has been
204
+ # trained on, but to the client it will appear to be the full text.
205
+ if return_full_text:
206
+ decoded = f"{instruction_text}\n{decoded}"
207
+
208
+ rec = {"generated_text": decoded}
209
+
210
+ records.append(rec)
211
+
212
+ return records
llama-2-7b-chat.ggmlv3.q5_K_M.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9011251750b26faab388d9fa05422cb47e2ced6676d3da188c5805072a0d4654
3
+ size 4782867072
special_tokens_map (1).json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "### End",
4
+ "### Instruction:",
5
+ "### Response:"
6
+ ],
7
+ "bos_token": "<|endoftext|>",
8
+ "eos_token": "<|endoftext|>",
9
+ "pad_token": "<|endoftext|>",
10
+ "unk_token": "<|endoftext|>"
11
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "### End",
4
+ "### Instruction:",
5
+ "### Response:"
6
+ ],
7
+ "bos_token": "<|endoftext|>",
8
+ "eos_token": "<|endoftext|>",
9
+ "pad_token": "<|endoftext|>",
10
+ "unk_token": "<|endoftext|>"
11
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<|endoftext|>",
4
+ "eos_token": "<|endoftext|>",
5
+ "model_max_length": 1000000000000000019884624838656,
6
+ "name_or_path": "EleutherAI/pythia-2.8b",
7
+ "special_tokens_map_file": "/admin/home-hailey/.cache/huggingface/hub/models--EleutherAI--gpt-neox-20b/snapshots/4e49eadb5d14bd22f314ec3f45b69a87b88c7691/special_tokens_map.json",
8
+ "tokenizer_class": "GPTNeoXTokenizer",
9
+ "unk_token": "<|endoftext|>"
10
+ }