Edit model card

Model Card for EncT5

EncT5 is a variant of T5 that utilizes mainly the encoder for non-autoregressive (ie. classification and regression) tasks. The model is from the paper Fine-tuning T5 Encoder for Non-autoregressive Tasks by Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li

Model Details

Model Description

EncT5 uses the same base weights at T5, but must be fine-tuning before use. There are several special features to EncT5:

  1. There are less decoder layers (a single decoder layer by default), and so has fewer parameters/resources than the standard T5.
  2. There is a separate decoder word embedding, with the decoder input ids being predefined constants. During fine-tuning, the decoder embedding learns to use these constants as "prompts" to the encoder for the corresponding classification/regression tasks.
  3. There is a classification head on top of the decoder output.

Research has shown that this model can be more efficient and usable over T5 and BERT for non-autoregressive tasks such as classification and regression.

How to Get Started with the Model

Use the code below to get started with the model.

model = AutoModelForSequenceClassification.from_pretrained("hackyon/enct5-base", trust_remote_code=True)
# Fine-tune the model before use.

See the github repro for a more comprehensive guide.

Training Details

Training Data

The weights of this model are directly copied from t5-base.

Training Procedure

This model must be fine-tuned before use. The decoder word embedding and classification head are both untrained.

Downloads last month
86
Safetensors
Model size
119M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train hackyon/enct5-base