Text Generation
Transformers
PyTorch
English
llama
text-generation-inference
Inference Endpoints
bleysg commited on
Commit
776fa74
1 Parent(s): bf8c918

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -15
README.md CHANGED
@@ -22,7 +22,8 @@ This dataset is our attempt to reproduce the dataset generated for Microsoft Res
22
  This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
23
 
24
  This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
25
- As well, this is done with ~1/3rd the compute requirement and using <20% of the dataset size from the original Orca paper.
 
26
 
27
  We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
28
 
@@ -58,7 +59,7 @@ Average for AGIEval: 0.441
58
  In the Orca paper, they measured their score relative to Vicuna on these evals.
59
  We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
60
 
61
- So we are surpassing Orca performance with <20% of the dataset size and ~1/3rd the training budget!
62
 
63
  ## BigBench-Hard Performance
64
 
@@ -82,6 +83,7 @@ We place #1 for all open models and come within comparison of text-davinci-003,
82
 
83
  ![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "GPT4ALL Performance")
84
 
 
85
  # Dataset
86
 
87
  We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
@@ -90,23 +92,36 @@ Further details of our curation practices will be forthcoming with our full mode
90
 
91
  # Training
92
 
93
- We trained with 8x A100-80G GPUs for 170 hours, completing 5 epochs of full fine tuning on our dataset.
94
  This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
95
- Our compute requirement was ~1/3rd that of the original Orca.
96
- Commodity cost was ~$2,300.
97
 
98
  Please await our full releases for further training details.
99
 
100
 
101
  # Prompt Template
102
 
103
- We use our own prompt template which we call "``"
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
 
106
  # Serving
107
 
108
  This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
109
- We also illustrate setup of Oobabooga/text-generation-webui below.
 
110
 
111
 
112
  ## Serving with OpenChat
@@ -128,16 +143,16 @@ You may then connect to the OpenAI-compatible API endpoint with tools such as [B
128
  ## Serving with Oobabooga / text-generation-webui
129
 
130
  The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
131
- See the requirements below.
132
 
133
  ### Oobabooga Key Requirements
134
 
135
- * You will first need to download the model as you normally do to the "`models/`" folder of your text-generation-webui installation.
136
  * To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
137
- * You will likely want to tick "`auto-devices`". The model will require >30GB VRAM after loading in context for inference.
138
  * The model was trained in bf16, so tick the "`bf16`" box for best performance.
139
  * It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
140
- * If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under tensor_split to split the model across both GPUs
141
  * The model will perform significantly better if you use the appropriate prompting template
142
  * We will submit a PR to include our prompting template into text-generation-webui soon
143
  * For now, manually enter the settings described in the following sections:
@@ -176,17 +191,19 @@ In the "`Text generation`" tab, select "`instruct`" as the mode:
176
  It should look as below:
177
  <img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
178
 
 
 
179
 
180
  # Citation
181
 
182
  ```bibtex
183
- @software{OpenOrca_Preview2,
184
- title = {OpenOrca_Preview2: A Llama2-13B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset},
185
- author = {Wing Lian and Bleys Goodson and Guan Wang and Eugene Pentland and Austin Cook and Chanvichet Vong` and "Teknium"},
186
  year = {2023},
187
  publisher = {HuggingFace},
188
  journal = {HuggingFace repository},
189
- howpublished = {\url{https://https://huggingface.co/Open-Orca/OpenOrca-Preview2-13B},
190
  }
191
  @software{openchat,
192
  title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},
 
22
  This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
23
 
24
  This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
25
+ We measured this with BigBench-Hard and AGIEval results with the same methods as used in the Orca paper, finding ~103% of original Orca's performance on average.
26
+ As well, this is done with ~1/10th the compute requirement and using <20% of the dataset size from the original Orca paper.
27
 
28
  We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
29
 
 
59
  In the Orca paper, they measured their score relative to Vicuna on these evals.
60
  We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
61
 
62
+ So we are surpassing Orca performance with <20% of the dataset size and ~1/10th the training budget!
63
 
64
  ## BigBench-Hard Performance
65
 
 
83
 
84
  ![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "GPT4ALL Performance")
85
 
86
+
87
  # Dataset
88
 
89
  We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
 
92
 
93
  # Training
94
 
95
+ We trained with 8x A100-80G GPUs for 46 hours, completing 5 epochs of full fine tuning on our dataset.
96
  This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
97
+ Our compute requirement was <1/10th that of the original Orca.
98
+ Commodity cost was ~$600.
99
 
100
  Please await our full releases for further training details.
101
 
102
 
103
  # Prompt Template
104
 
105
+ We use our own prompt template which we call "`OpenChat Llama2 V1`"
106
+
107
+
108
+ Examples:
109
+ ```
110
+ # Single-turn V1 Llama 2
111
+ tokenize("User: Hello<|end_of_turn|>Assistant:")
112
+ # Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
113
+
114
+ # Multi-turn V1 Llama 2
115
+ tokenize("User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
116
+ # Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
117
+ ```
118
 
119
 
120
  # Serving
121
 
122
  This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
123
+ This is highly recommended as it is by far the fastest in terms of inference speed and is a quick and easy option for setup.
124
+ We also illustrate setup of Oobabooga/text-generation-webui below. The settings outlined there will also apply to other uses of `Transformers`.
125
 
126
 
127
  ## Serving with OpenChat
 
143
  ## Serving with Oobabooga / text-generation-webui
144
 
145
  The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
146
+ See the requirements below. Note that inference with Transformers is significantly slower than using the recommended OpenChat vLLM server.
147
 
148
  ### Oobabooga Key Requirements
149
 
150
+ * You will first need to download the model as you normally do to the "`models/`" folder of your `text-generation-webui` installation.
151
  * To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
152
+ * You will likely want to tick "`auto-devices`". The model will require >40GB VRAM after loading in context for inference.
153
  * The model was trained in bf16, so tick the "`bf16`" box for best performance.
154
  * It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
155
+ * If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under "`tensor_split`" to split the model across both GPUs
156
  * The model will perform significantly better if you use the appropriate prompting template
157
  * We will submit a PR to include our prompting template into text-generation-webui soon
158
  * For now, manually enter the settings described in the following sections:
 
191
  It should look as below:
192
  <img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
193
 
194
+ Then you should be ready to generate!
195
+
196
 
197
  # Citation
198
 
199
  ```bibtex
200
+ @software{OpenOrcaxOpenChatPreview2,
201
+ title = {OpenOrcaxOpenChatPreview2: Llama2-13B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset},
202
+ author = {Guan Wang and Bleys Goodson and Wing Lian and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
203
  year = {2023},
204
  publisher = {HuggingFace},
205
  journal = {HuggingFace repository},
206
+ howpublished = {\url{https://https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B},
207
  }
208
  @software{openchat,
209
  title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},