Model Overview
An OPT decoder network.
This class implements a Transformer-based decoder model as described in
"OPT: Open Pre-trained Transformer Language Models".
The default constructor gives a fully customizable, randomly initialized OPT
model with any number of layers, heads, and embedding dimensions. To load
preset architectures and weights, use the from_preset()
constructor.
Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. The underlying model is provided by a third party and subject to a separate license, available here.
Links
- OPT Quickstart Notebook
- OPT API Documentation
- KerasHub Beginner Guide
- KerasHub Model Publishing Guide
Installation
Keras and KerasHub can be installed with:
pip install -U -q keras-Hub
pip install -U -q keras
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.
Presets
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
Preset name | Parameters | Description |
---|---|---|
opt_1.3b_en | 125.24M | 12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
opt_125m_en | 1.32B | 24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
opt_2.7b_en | 2.70B | 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
opt_6.7b_en | 6.70B | 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
Arguments
- vocabulary_size: int. The size of the token vocabulary.
- num_layers: int. The number of transformer decoder layers.
- num_heads: int. The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads.
- hidden_dim: int. The hidden size of the transformer decoder layers.
- intermediate_dim: int. The output dimension of the first Dense layer in a two-layer feedforward network for each transformer decoder layer.
- dropout: float. Dropout probability for the Transformer decoder.
- max_sequence_length: int. The maximum sequence length that this decoder
can consume. If
None
,max_sequence_length
uses the value from sequence length. This determines the variable shape for positional embeddings.
Example Usage
import keras
import keras_hub
import numpy as np
Use generate()
to do text generation.
opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_1.3b_en")
opt_lm.generate("I want to say", max_length=30)
# Generate with batched prompts.
opt_lm.generate(["This is a", "Where are you"], max_length=30)
Compile the generate()
function with a custom sampler.
opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_1.3b_en")
opt_lm.compile(sampler="greedy")
opt_lm.generate("I want to say", max_length=30)
opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2))
opt_lm.generate("I want to say", max_length=30)
Use generate()
without preprocessing.
# Prompt the model with `5338, 318` (the token ids for `"Who is"`).
# Use `"padding_mask"` to indicate values that should not be overridden.
prompt = {
"token_ids": np.array([[5338, 318, 0, 0, 0]] * 2),
"padding_mask": np.array([[1, 1, 0, 0, 0]] * 2),
}
opt_lm = keras_hub.models.OPTCausalLM.from_preset(
"opt_1.3b_en",
preprocessor=None,
)
opt_lm.generate(prompt)
Call fit()
on a single batch.
features = ["The quick brown fox jumped.", "I forgot my homework."]
opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_1.3b_en")
opt_lm.fit(x=features, batch_size=2)
Call fit()
without preprocessing.
x = {
"token_ids": np.array([[1, 2, 3, 4, 5]] * 2),
"padding_mask": np.array([[1, 1, 1, 1, 1]] * 2),
}
y = np.array([[2, 3, 4, 5, 0]] * 2)
sw = np.array([[1, 1, 1, 1, 1]] * 2)
opt_lm = keras_hub.models.OPTCausalLM.from_preset(
"opt_1.3b_en",
preprocessor=None,
)
opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2)
Example Usage with Hugging Face URI
import keras
import keras_hub
import numpy as np
Use generate()
to do text generation.
opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_1.3b_en")
opt_lm.generate("I want to say", max_length=30)
# Generate with batched prompts.
opt_lm.generate(["This is a", "Where are you"], max_length=30)
Compile the generate()
function with a custom sampler.
opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_1.3b_en")
opt_lm.compile(sampler="greedy")
opt_lm.generate("I want to say", max_length=30)
opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2))
opt_lm.generate("I want to say", max_length=30)
Use generate()
without preprocessing.
# Prompt the model with `5338, 318` (the token ids for `"Who is"`).
# Use `"padding_mask"` to indicate values that should not be overridden.
prompt = {
"token_ids": np.array([[5338, 318, 0, 0, 0]] * 2),
"padding_mask": np.array([[1, 1, 0, 0, 0]] * 2),
}
opt_lm = keras_hub.models.OPTCausalLM.from_preset(
"hf://keras/opt_1.3b_en",
preprocessor=None,
)
opt_lm.generate(prompt)
Call fit()
on a single batch.
features = ["The quick brown fox jumped.", "I forgot my homework."]
opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_1.3b_en")
opt_lm.fit(x=features, batch_size=2)
Call fit()
without preprocessing.
x = {
"token_ids": np.array([[1, 2, 3, 4, 5]] * 2),
"padding_mask": np.array([[1, 1, 1, 1, 1]] * 2),
}
y = np.array([[2, 3, 4, 5, 0]] * 2)
sw = np.array([[1, 1, 1, 1, 1]] * 2)
opt_lm = keras_hub.models.OPTCausalLM.from_preset(
"hf://keras/opt_1.3b_en",
preprocessor=None,
)
opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2)
- Downloads last month
- 5