Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,8 @@ This dataset is our attempt to reproduce the dataset generated for Microsoft Res
|
|
22 |
This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
|
23 |
|
24 |
This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
|
25 |
-
|
|
|
26 |
|
27 |
We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
|
28 |
|
@@ -58,7 +59,7 @@ Average for AGIEval: 0.441
|
|
58 |
In the Orca paper, they measured their score relative to Vicuna on these evals.
|
59 |
We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
|
60 |
|
61 |
-
So we are surpassing Orca performance with <20% of the dataset size and ~1/
|
62 |
|
63 |
## BigBench-Hard Performance
|
64 |
|
@@ -82,6 +83,7 @@ We place #1 for all open models and come within comparison of text-davinci-003,
|
|
82 |
|
83 |
![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "GPT4ALL Performance")
|
84 |
|
|
|
85 |
# Dataset
|
86 |
|
87 |
We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
|
@@ -90,23 +92,36 @@ Further details of our curation practices will be forthcoming with our full mode
|
|
90 |
|
91 |
# Training
|
92 |
|
93 |
-
We trained with 8x A100-80G GPUs for
|
94 |
This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
|
95 |
-
Our compute requirement was
|
96 |
-
Commodity cost was ~$
|
97 |
|
98 |
Please await our full releases for further training details.
|
99 |
|
100 |
|
101 |
# Prompt Template
|
102 |
|
103 |
-
We use our own prompt template which we call "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
|
106 |
# Serving
|
107 |
|
108 |
This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
|
109 |
-
|
|
|
110 |
|
111 |
|
112 |
## Serving with OpenChat
|
@@ -128,16 +143,16 @@ You may then connect to the OpenAI-compatible API endpoint with tools such as [B
|
|
128 |
## Serving with Oobabooga / text-generation-webui
|
129 |
|
130 |
The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
|
131 |
-
See the requirements below.
|
132 |
|
133 |
### Oobabooga Key Requirements
|
134 |
|
135 |
-
* You will first need to download the model as you normally do to the "`models/`" folder of your text-generation-webui installation.
|
136 |
* To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
|
137 |
-
* You will likely want to tick "`auto-devices`". The model will require >
|
138 |
* The model was trained in bf16, so tick the "`bf16`" box for best performance.
|
139 |
* It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
|
140 |
-
* If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under tensor_split to split the model across both GPUs
|
141 |
* The model will perform significantly better if you use the appropriate prompting template
|
142 |
* We will submit a PR to include our prompting template into text-generation-webui soon
|
143 |
* For now, manually enter the settings described in the following sections:
|
@@ -176,17 +191,19 @@ In the "`Text generation`" tab, select "`instruct`" as the mode:
|
|
176 |
It should look as below:
|
177 |
<img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
|
178 |
|
|
|
|
|
179 |
|
180 |
# Citation
|
181 |
|
182 |
```bibtex
|
183 |
-
@software{
|
184 |
-
title = {
|
185 |
-
author = {
|
186 |
year = {2023},
|
187 |
publisher = {HuggingFace},
|
188 |
journal = {HuggingFace repository},
|
189 |
-
howpublished = {\url{https://https://huggingface.co/Open-Orca/
|
190 |
}
|
191 |
@software{openchat,
|
192 |
title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},
|
|
|
22 |
This second preview release is trained on a curated filtered subset of most of our GPT4 augmented data.
|
23 |
|
24 |
This release highlights that our dataset and training methods have surpassed performance parity with the Orca paper.
|
25 |
+
We measured this with BigBench-Hard and AGIEval results with the same methods as used in the Orca paper, finding ~103% of original Orca's performance on average.
|
26 |
+
As well, this is done with ~1/10th the compute requirement and using <20% of the dataset size from the original Orca paper.
|
27 |
|
28 |
We have run extensive evaluations internally and expect this model to place number 1 on both the HuggingFaceH4 Open LLM Leaderboard and the GPT4ALL Leaderboard for 13B models.
|
29 |
|
|
|
59 |
In the Orca paper, they measured their score relative to Vicuna on these evals.
|
60 |
We've done the same and have found our score averages to >103% of the total improvement that was shown in the Orca paper, using the same evaluation methods as outlined in the paper.
|
61 |
|
62 |
+
So we are surpassing Orca performance with <20% of the dataset size and ~1/10th the training budget!
|
63 |
|
64 |
## BigBench-Hard Performance
|
65 |
|
|
|
83 |
|
84 |
![OpenOrca Preview2 GPT4ALL Performance](https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/OO_Preview2_AGIEval.png "GPT4ALL Performance")
|
85 |
|
86 |
+
|
87 |
# Dataset
|
88 |
|
89 |
We used a curated, filtered selection of most of the GPT-4 augmented data from our OpenOrca dataset, which aims to reproduce the Orca Research Paper dataset.
|
|
|
92 |
|
93 |
# Training
|
94 |
|
95 |
+
We trained with 8x A100-80G GPUs for 46 hours, completing 5 epochs of full fine tuning on our dataset.
|
96 |
This contrasts with the 20x A100-80G GPUs for 200 hours used in the Orca paper, for only 3 epochs.
|
97 |
+
Our compute requirement was <1/10th that of the original Orca.
|
98 |
+
Commodity cost was ~$600.
|
99 |
|
100 |
Please await our full releases for further training details.
|
101 |
|
102 |
|
103 |
# Prompt Template
|
104 |
|
105 |
+
We use our own prompt template which we call "`OpenChat Llama2 V1`"
|
106 |
+
|
107 |
+
|
108 |
+
Examples:
|
109 |
+
```
|
110 |
+
# Single-turn V1 Llama 2
|
111 |
+
tokenize("User: Hello<|end_of_turn|>Assistant:")
|
112 |
+
# Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901]
|
113 |
+
|
114 |
+
# Multi-turn V1 Llama 2
|
115 |
+
tokenize("User: Hello<|end_of_turn|>Assistant: Hi<|end_of_turn|>User: How are you today?<|end_of_turn|>Assistant:")
|
116 |
+
# Result: [1, 4911, 29901, 15043, 32000, 4007, 22137, 29901, 6324, 32000, 4911, 29901, 1128, 526, 366, 9826, 29973, 32000, 4007, 22137, 29901]
|
117 |
+
```
|
118 |
|
119 |
|
120 |
# Serving
|
121 |
|
122 |
This model is most easily served with [OpenChat's](https://github.com/imoneoi/openchat) customized vLLM OpenAI-compatible API server.
|
123 |
+
This is highly recommended as it is by far the fastest in terms of inference speed and is a quick and easy option for setup.
|
124 |
+
We also illustrate setup of Oobabooga/text-generation-webui below. The settings outlined there will also apply to other uses of `Transformers`.
|
125 |
|
126 |
|
127 |
## Serving with OpenChat
|
|
|
143 |
## Serving with Oobabooga / text-generation-webui
|
144 |
|
145 |
The model may also be loaded via [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui/) in a similar manner to other models.
|
146 |
+
See the requirements below. Note that inference with Transformers is significantly slower than using the recommended OpenChat vLLM server.
|
147 |
|
148 |
### Oobabooga Key Requirements
|
149 |
|
150 |
+
* You will first need to download the model as you normally do to the "`models/`" folder of your `text-generation-webui` installation.
|
151 |
* To use the unquantized model presented here, select "`Transformers`"" in the webui's "`Model`" tab "`Model loader`" dropdown.
|
152 |
+
* You will likely want to tick "`auto-devices`". The model will require >40GB VRAM after loading in context for inference.
|
153 |
* The model was trained in bf16, so tick the "`bf16`" box for best performance.
|
154 |
* It will run safely on single GPUs with VRAM >=48GB (e.g. A6000)
|
155 |
+
* If using consumer GPUs, e.g. 2x RTX3090 24GB, you will likely want to enter "18,17" under "`tensor_split`" to split the model across both GPUs
|
156 |
* The model will perform significantly better if you use the appropriate prompting template
|
157 |
* We will submit a PR to include our prompting template into text-generation-webui soon
|
158 |
* For now, manually enter the settings described in the following sections:
|
|
|
191 |
It should look as below:
|
192 |
<img src="https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B/resolve/main/Images/OpenOrcaLlama2OobaboogaInstructMode.png" style="width: 40%">
|
193 |
|
194 |
+
Then you should be ready to generate!
|
195 |
+
|
196 |
|
197 |
# Citation
|
198 |
|
199 |
```bibtex
|
200 |
+
@software{OpenOrcaxOpenChatPreview2,
|
201 |
+
title = {OpenOrcaxOpenChatPreview2: Llama2-13B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset},
|
202 |
+
author = {Guan Wang and Bleys Goodson and Wing Lian and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
|
203 |
year = {2023},
|
204 |
publisher = {HuggingFace},
|
205 |
journal = {HuggingFace repository},
|
206 |
+
howpublished = {\url{https://https://huggingface.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B},
|
207 |
}
|
208 |
@software{openchat,
|
209 |
title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},
|