Muennighoff commited on
Commit
0d03e09
·
1 Parent(s): 2d8805f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -20
README.md CHANGED
@@ -235,19 +235,14 @@ model-index:
235
 
236
  ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)
237
 
238
- # OctoGeeX
239
 
240
- Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).
 
 
 
241
 
242
- ## Table of Contents
243
-
244
- 1. [Model Summary](##model-summary)
245
- 2. [Use](##use)
246
- 3. [Training](##training)
247
- 5. [License](##license)
248
- 6. [Citation](##citation)
249
-
250
- ## Model Summary
251
 
252
  OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
253
 
@@ -284,15 +279,15 @@ OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning
284
  </table>
285
 
286
 
287
- ## Use
288
 
289
- ### Intended use
290
 
291
  The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
292
 
293
  **Feel free to share your generations in the Community tab!**
294
 
295
- ### Generation
296
  ```python
297
  # pip install -q transformers
298
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -308,16 +303,16 @@ outputs = model.generate(inputs)
308
  print(tokenizer.decode(outputs[0]))
309
  ```
310
 
311
- ## Training
312
 
313
- ### Model
314
 
315
  - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
316
  - **Steps:** 250k pretraining & 30 instruction tuning
317
  - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
318
  - **Precision:** bfloat16
319
 
320
- ### Hardware
321
 
322
  - **Pretraining:**
323
  - **GPUs:** 512 Tesla A100
@@ -326,17 +321,17 @@ print(tokenizer.decode(outputs[0]))
326
  - **GPUs:** 8 Tesla A100
327
  - **Training time:** 4 hours
328
 
329
- ### Software
330
 
331
  - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
332
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
333
 
334
- ## License
335
 
336
  本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
337
 
338
  The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
339
 
340
- ## Citation
341
 
342
  TODO
 
235
 
236
  ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)
237
 
238
+ # Table of Contents
239
 
240
+ 1. [Model Summary](#model-summary)
241
+ 2. [Use](#use)
242
+ 3. [Training](#training)
243
+ 4. [Citation](#citation)
244
 
245
+ # Model Summary
 
 
 
 
 
 
 
 
246
 
247
  OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
248
 
 
279
  </table>
280
 
281
 
282
+ # Use
283
 
284
+ ## Intended use
285
 
286
  The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
287
 
288
  **Feel free to share your generations in the Community tab!**
289
 
290
+ ## Generation
291
  ```python
292
  # pip install -q transformers
293
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
303
  print(tokenizer.decode(outputs[0]))
304
  ```
305
 
306
+ # Training
307
 
308
+ ## Model
309
 
310
  - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
311
  - **Steps:** 250k pretraining & 30 instruction tuning
312
  - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
313
  - **Precision:** bfloat16
314
 
315
+ ## Hardware
316
 
317
  - **Pretraining:**
318
  - **GPUs:** 512 Tesla A100
 
321
  - **GPUs:** 8 Tesla A100
322
  - **Training time:** 4 hours
323
 
324
+ ## Software
325
 
326
  - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
327
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
328
 
329
+ # License
330
 
331
  本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
332
 
333
  The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
334
 
335
+ # Citation
336
 
337
  TODO