GPT-2 Medium Multi-Exit

Pre-trained language model with identical parameters to gpt2-medium, but with additional language modeling heads ("exits") connected to different layers of the model.

These 12 additional heads (in layers 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24) were trained on the English portion of CC-100 while keeping the original pre-trained model parameters frozen.

The model can be used for the Autocontrastive Decoding text generation approach described in Gera et al. 2023, for early-exiting approaches, or for other algorithms that consider the next-token predictions of different model layers.

Usage

Harnessing the additional language modeling heads requires loading the model using the auto-contrastive-generation library (pip install autocontrastive-gen).

In a nutshell, the user creates a MultiExitConfiguration that determines model behavior at training and inference, and then loads the model using the dedicated AutoMultiExitModel class. After that, the model can be used with the transformers API like any other model. See the GitHub for detailed usage instructions.

For example, the code below initializes the model to use Autocontrastive Decoding, and then performs text generation in this chosen setting:

from transformers import AutoTokenizer
from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel

# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
multi_exit_config = MultiExitConfiguration(use_original_head=False, 
                                           contrast_layer_indices=(24, 12))
model = AutoMultiExitModel.from_pretrained("IBM/gpt2-medium-multiexit", multi_exit_config=multi_exit_config)

# perform text generation as usual
tokenizer = AutoTokenizer.from_pretrained("IBM/gpt2-medium-multiexit")
prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
generated_ids = model.generate(**prompt, max_new_tokens=15)
print(tokenizer.batch_decode(generated_ids))

Citation

Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch. The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers. ACL 2023.

@inproceedings{gera2023autocontrastive,
  title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
  author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month={july},
  address={Toronto, Canada},
  year={2023}
}
Downloads last month
18
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ibm/gpt2-medium-multiexit