Text Generation
Transformers
PyTorch
Safetensors
English
hf_olmo
custom_code
shanearora commited on
Commit
6bd9544
·
verified ·
1 Parent(s): 98eae16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -16
README.md CHANGED
@@ -42,8 +42,9 @@ In particular, we focus on four revisions of the 7B models:
42
 
43
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
44
  ```bash
45
- import hf_olmo # pip install ai2-olmo
46
- olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-Twin-2T", revision="step1000-tokens4B")
 
47
  ```
48
 
49
  All revisions/branches are listed in the file `revisions.txt`.
@@ -93,11 +94,10 @@ pip install ai2-olmo
93
  ```
94
  Now, proceed as usual with HuggingFace:
95
  ```python
96
- import hf_olmo
97
 
98
- from transformers import AutoModelForCausalLM, AutoTokenizer
99
- olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B")
100
- tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B")
101
  message = ["Language modeling is "]
102
  inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
103
  # optional verifying cuda
@@ -107,17 +107,8 @@ response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50,
107
  print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
108
  >> 'Language modeling is the first step to build natural language generation...'
109
  ```
110
- Alternatively, with the pipeline abstraction:
111
- ```python
112
- import hf_olmo
113
-
114
- from transformers import pipeline
115
- olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B")
116
- print(olmo_pipe("Language modeling is "))
117
- >> 'Language modeling is a branch of natural language processing that aims to...'
118
- ```
119
 
120
- Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
121
  The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
122
 
123
  Note, you may see the following error if `ai2-olmo` is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.
 
42
 
43
  To load a specific model revision with HuggingFace, simply add the argument `revision`:
44
  ```bash
45
+ from hf_olmo import OLMoForCausalLM # pip install ai2-olmo
46
+
47
+ olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Twin-2T", revision="step1000-tokens4B")
48
  ```
49
 
50
  All revisions/branches are listed in the file `revisions.txt`.
 
94
  ```
95
  Now, proceed as usual with HuggingFace:
96
  ```python
97
+ from hf_olmo import OLMoForCausalLM, OLMoTokenizerFast
98
 
99
+ olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-7B-Twin-2T")
100
+ tokenizer = OLMoTokenizerFast.from_pretrained("allenai/OLMo-7B-Twin-2T")
 
101
  message = ["Language modeling is "]
102
  inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
103
  # optional verifying cuda
 
107
  print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
108
  >> 'Language modeling is the first step to build natural language generation...'
109
  ```
 
 
 
 
 
 
 
 
 
110
 
111
+ You can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
112
  The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
113
 
114
  Note, you may see the following error if `ai2-olmo` is not installed correctly, which is caused by internal Python check naming. We'll update the code soon to make this error clearer.