keras/opt_1.3b_en · Hugging Face

Model Overview

An OPT decoder network.

This class implements a Transformer-based decoder model as described in "OPT: Open Pre-trained Transformer Language Models". The default constructor gives a fully customizable, randomly initialized OPT model with any number of layers, heads, and embedding dimensions. To load preset architectures and weights, use the from_preset() constructor.

Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. The underlying model is provided by a third party and subject to a separate license, available here.

Installation

Keras and KerasHub can be installed with:

pip install -U -q keras-Hub
pip install -U -q keras

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.

Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

Preset name	Parameters	Description
opt_1.3b_en	125.24M	12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_125m_en	1.32B	24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_2.7b_en	2.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_6.7b_en	6.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.

Arguments

vocabulary_size: int. The size of the token vocabulary.
num_layers: int. The number of transformer decoder layers.
num_heads: int. The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads.
hidden_dim: int. The hidden size of the transformer decoder layers.
intermediate_dim: int. The output dimension of the first Dense layer in a two-layer feedforward network for each transformer decoder layer.
dropout: float. Dropout probability for the Transformer decoder.
max_sequence_length: int. The maximum sequence length that this decoder can consume. If None, max_sequence_length uses the value from sequence length. This determines the variable shape for positional embeddings.

Example Usage

import keras
import keras_hub
import numpy as np

Use generate() to do text generation.

opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_1.3b_en")
opt_lm.generate("I want to say", max_length=30)

# Generate with batched prompts.
opt_lm.generate(["This is a", "Where are you"], max_length=30)

Compile the generate() function with a custom sampler.

opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_1.3b_en")
opt_lm.compile(sampler="greedy")
opt_lm.generate("I want to say", max_length=30)

opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2))
opt_lm.generate("I want to say", max_length=30)

Use generate() without preprocessing.

# Prompt the model with `5338, 318` (the token ids for `"Who is"`).
# Use `"padding_mask"` to indicate values that should not be overridden.
prompt = {
    "token_ids": np.array([[5338, 318, 0, 0, 0]] * 2),
    "padding_mask": np.array([[1, 1, 0, 0, 0]] * 2),
}

opt_lm = keras_hub.models.OPTCausalLM.from_preset(
    "opt_1.3b_en",
    preprocessor=None,
)
opt_lm.generate(prompt)

Call fit() on a single batch.

features = ["The quick brown fox jumped.", "I forgot my homework."]
opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_1.3b_en")
opt_lm.fit(x=features, batch_size=2)

Call fit() without preprocessing.

x = {
    "token_ids": np.array([[1, 2, 3, 4, 5]] * 2),
    "padding_mask": np.array([[1, 1, 1, 1, 1]] * 2),
}
y = np.array([[2, 3, 4, 5, 0]] * 2)
sw = np.array([[1, 1, 1, 1, 1]] * 2)

opt_lm = keras_hub.models.OPTCausalLM.from_preset(
    "opt_1.3b_en",
    preprocessor=None,
)
opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2)

Example Usage with Hugging Face URI

import keras
import keras_hub
import numpy as np