flan_base_multi / README.md
Divyasreepat's picture
Update README.md with new model card content
e2a510f verified
|
raw
history blame
2.53 kB
metadata
library_name: keras-hub
license: apache-2.0
tags:
  - text-classification
  - keras
pipeline_tag: text-generation

Model Overview

⚠️ T5 is currently only available via the keras-hub-nightly package. Use pip install keras-hub-nightly to try this model.

T5 encoder-decoder backbone model.

T5 is a LLM pretrained on a mix of unsupervised and supervised tasks, where each task is converted to a sequence-to-sequence format. T5 works well on a variety of tasks out-of-the-box by prepending various prefixex to the input sequence, e.g., for translation: "translate English to German: ...", for summarization: "summarize: ...".

T5 was introduced in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

The default constructor gives a fully customizable, randomly initialized T5 model with any number of layers, heads, and embedding dimensions. To load preset architectures and weights, use the from_preset constructor.

Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind.

Arguments

  • vocabulary_size: int. The size of the token vocabulary.
  • num_layers: int. The number of Transformer layers.
  • num_heads: int. The number of attention heads for each Transformer. The hidden size must be divisible by the number of attention heads.
  • hidden_dim: int. The hidden size of the Transformer layers.
  • intermediate_dim: int. The output dimension of the first Dense layer in a two-layer feedforward network for each Transformer layer.
  • key_value_dim: int. The dimension of each head of the key/value projections in the multi-head attention layers. Defaults to hidden_dim / num_heads.
  • dropout: float. Dropout probability for the Transformer layers.
  • activation: activation function (or activation string name). The activation to be used in the inner dense blocks of the Transformer layers. Defaults to "relu".
  • use_gated_activation: boolean. Whether to use activation gating in the inner dense blocks of the Transformer layers. The original T5 architecture didn't use gating, but more recent versions do. Defaults to True.
  • layer_norm_epsilon: float. Epsilon factor to be used in the layer normalization layers in the Transformer layers.
  • tie_embedding_weights: boolean. If True, the weights of the token embedding and the weights projecting language model outputs from hidden_dim