|
--- |
|
license: mit |
|
datasets: |
|
- cc100 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# GPT-Neo-125M Multi-Exit |
|
Pre-trained language model with identical parameters to [gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m), but with additional language modeling heads ("exits") connected to different layers of the model. |
|
|
|
These 6 additional heads (in layers 2, 4, 6, 8, 10, 12) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen. |
|
|
|
The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers. |
|
|
|
## Usage |
|
Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`). |
|
|
|
In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions. |
|
|
|
For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting: |
|
|
|
```python |
|
from transformers import AutoTokenizer |
|
from autocontrastive_gen.modeling.configuration import MultiExitConfiguration |
|
from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel |
|
|
|
# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12 |
|
multi_exit_config = MultiExitConfiguration(use_original_head=False, |
|
contrast_layer_indices=(24, 12)) |
|
model = AutoMultiExitModel.from_pretrained("IBM/gpt-neo-125m-multiexit", multi_exit_config=multi_exit_config) |
|
|
|
# perform text generation as usual |
|
tokenizer = AutoTokenizer.from_pretrained("IBM/gpt-neo-125m-multiexit") |
|
prompt = tokenizer("humpty dumpty sat on", return_tensors='pt') |
|
generated_ids = model.generate(**prompt, max_new_tokens=15) |
|
print(tokenizer.batch_decode(generated_ids)) |
|
``` |
|
|
|
## Citation |
|
Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch. |
|
[The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023. |
|
|
|
```bibtex |
|
@inproceedings{gera2023autocontrastive, |
|
title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers}, |
|
author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal}, |
|
booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, |
|
month={july}, |
|
address={Toronto, Canada}, |
|
year={2023} |
|
} |
|
``` |
|
|