rpasunuru commited on
Commit
9c01e3f
1 Parent(s): d82edc2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ tags:
4
+ - text-generation
5
+ - opt
6
+
7
+ license: other
8
+ commercial: false
9
+ ---
10
+ # OPT-IML
11
+
12
+ ## Model Description
13
+
14
+ OPT-IML models are instruction-tuned versions of OPT. They are fine-tuned on 2000 NLP tasks from 8 existing public benchmarks.
15
+ OPT-IML models are significantly better than OPT model and demonstrate different generalization abilities on four different
16
+ evaluation benchmarks with diverse tasks and input formats – PromptSource, FLAN, Super-NaturalInstructions, and UnifiedSKG.
17
+
18
+ ### How to use
19
+ For large OPT models, such as this one, it is not recommend to make use of the `text-generation` pipeline because
20
+ one should load the model in half-precision to accelerate generation and optimize memory consumption on GPU.
21
+ It is recommended to directly call the [`generate`](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate)
22
+ method as follows:
23
+
24
+ ```python
25
+ >>> from transformers import AutoModelForCausalLM, AutoTokenizer
26
+ >>> import torch
27
+
28
+ >>> model = AutoModelForCausalLM.from_pretrained("facebook/opt-iml-30b", torch_dtype=torch.float16).cuda()
29
+
30
+ >>> # the fast tokenizer currently does not work correctly
31
+ >>> tokenizer = AutoTokenizer.from_pretrained("facebook/opt-iml-30b", use_fast=False)
32
+
33
+ >>> prompt = "What is the color of a carrot?\nA:"
34
+
35
+ >>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
36
+
37
+ >>> generated_ids = model.generate(input_ids)
38
+
39
+ >>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
40
+ ```
41
+
42
+ ### Limitations and bias
43
+
44
+ While OPT-IML models outperform baseline OPT on an extensive set of evaluations,
45
+ nevertheless, they are susceptible to the various risks associated with using large language models
46
+ relating to factual correctness, generation of toxic language and enforcing stereotypes. While we release our
47
+ OPT-IML models to proliferate future work on instruction-tuning and to improve the availability
48
+ of large instruction-tuned causal LMs, the use of these models should be
49
+ accompanied with responsible best practices.
50
+
51
+ ## Training data
52
+ OPT-IML models are trained on OPT-IML Bench, a large benchmark for Instruction MetaLearning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks include Super-NaturalInstructions, FLAN, PromptSource, etc.
53
+
54
+ ## Training procedure
55
+ The texts are tokenized using the GPT2 byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
56
+
57
+ The 30B model was fine-tuned on 64 40GB A100 GPUs. During fine-tuning, models saw approximately 2 billion tokens, which is only 0.6% of the pre-training
58
+ budget of OPT.
59
+
60
+
61
+ ### BibTeX entry and citation info
62
+ ```bibtex
63
+ @misc{iyer2022opt,
64
+ title={OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization},
65
+ author={Iyer, Srinivasan and Lin, Xi Victoria and Pasunuru, Ramakanth and Mihaylov, Todor and Simig, D{\'a}niel and Yu, Ping and Shuster, Kurt and Wang, Tianlu and Liu, Qing and Koura, Punit Singh and others},
66
+ year={2022},
67
+ eprint={2212.12017},
68
+ archivePrefix={arXiv},
69
+ primaryClass={cs.CL}
70
+ }
71
+ ```