Commit
·
0d03e09
1
Parent(s):
2d8805f
Update README.md
Browse files
README.md
CHANGED
@@ -235,19 +235,14 @@ model-index:
|
|
235 |
|
236 |

|
237 |
|
238 |
-
#
|
239 |
|
240 |
-
|
|
|
|
|
|
|
241 |
|
242 |
-
|
243 |
-
|
244 |
-
1. [Model Summary](##model-summary)
|
245 |
-
2. [Use](##use)
|
246 |
-
3. [Training](##training)
|
247 |
-
5. [License](##license)
|
248 |
-
6. [Citation](##citation)
|
249 |
-
|
250 |
-
## Model Summary
|
251 |
|
252 |
OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
|
253 |
|
@@ -284,15 +279,15 @@ OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning
|
|
284 |
</table>
|
285 |
|
286 |
|
287 |
-
|
288 |
|
289 |
-
|
290 |
|
291 |
The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
|
292 |
|
293 |
**Feel free to share your generations in the Community tab!**
|
294 |
|
295 |
-
|
296 |
```python
|
297 |
# pip install -q transformers
|
298 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
@@ -308,16 +303,16 @@ outputs = model.generate(inputs)
|
|
308 |
print(tokenizer.decode(outputs[0]))
|
309 |
```
|
310 |
|
311 |
-
|
312 |
|
313 |
-
|
314 |
|
315 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
316 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
317 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
318 |
- **Precision:** bfloat16
|
319 |
|
320 |
-
|
321 |
|
322 |
- **Pretraining:**
|
323 |
- **GPUs:** 512 Tesla A100
|
@@ -326,17 +321,17 @@ print(tokenizer.decode(outputs[0]))
|
|
326 |
- **GPUs:** 8 Tesla A100
|
327 |
- **Training time:** 4 hours
|
328 |
|
329 |
-
|
330 |
|
331 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
332 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
333 |
|
334 |
-
|
335 |
|
336 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
337 |
|
338 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
339 |
|
340 |
-
|
341 |
|
342 |
TODO
|
|
|
235 |
|
236 |

|
237 |
|
238 |
+
# Table of Contents
|
239 |
|
240 |
+
1. [Model Summary](#model-summary)
|
241 |
+
2. [Use](#use)
|
242 |
+
3. [Training](#training)
|
243 |
+
4. [Citation](#citation)
|
244 |
|
245 |
+
# Model Summary
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
246 |
|
247 |
OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
|
248 |
|
|
|
279 |
</table>
|
280 |
|
281 |
|
282 |
+
# Use
|
283 |
|
284 |
+
## Intended use
|
285 |
|
286 |
The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
|
287 |
|
288 |
**Feel free to share your generations in the Community tab!**
|
289 |
|
290 |
+
## Generation
|
291 |
```python
|
292 |
# pip install -q transformers
|
293 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
303 |
print(tokenizer.decode(outputs[0]))
|
304 |
```
|
305 |
|
306 |
+
# Training
|
307 |
|
308 |
+
## Model
|
309 |
|
310 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
311 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
312 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
313 |
- **Precision:** bfloat16
|
314 |
|
315 |
+
## Hardware
|
316 |
|
317 |
- **Pretraining:**
|
318 |
- **GPUs:** 512 Tesla A100
|
|
|
321 |
- **GPUs:** 8 Tesla A100
|
322 |
- **Training time:** 4 hours
|
323 |
|
324 |
+
## Software
|
325 |
|
326 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
327 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
328 |
|
329 |
+
# License
|
330 |
|
331 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
332 |
|
333 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
334 |
|
335 |
+
# Citation
|
336 |
|
337 |
TODO
|