ibm-research
/

gpt2-medium-multiexit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2-medium-multiexit / README.md

arielgera's picture

Update README.md

33df4c2 almost 2 years ago

|

3.11 kB

	---
	license: mit
	datasets:
	- cc100
	language:
	- en
	pipeline_tag: text-generation
	---

	# GPT-2 Medium Multi-Exit
	Pre-trained language model with identical parameters to [gpt2-medium](https://huggingface.co/gpt2-medium), but with additional language modeling heads ("exits") connected to different layers of the model.

	These 12 additional heads (in layers 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen.

	The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers.

	## Usage
	Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`).

	In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions.

	For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting:

	```python
	from transformers import AutoTokenizer
	from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
	from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel

	# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
	multi_exit_config = MultiExitConfiguration(use_original_head=False,
	contrast_layer_indices=(24, 12))
	model = AutoMultiExitModel.from_pretrained("IBM/gpt2-medium-multiexit", multi_exit_config=multi_exit_config)

	# perform text generation as usual
	tokenizer = AutoTokenizer.from_pretrained("IBM/gpt2-medium-multiexit")
	prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
	generated_ids = model.generate(**prompt, max_new_tokens=15)
	print(tokenizer.batch_decode(generated_ids))
	```

	## Citation
	Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch.
	[The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023.

	```bibtex
	@inproceedings{gera2023autocontrastive,
	title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
	author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
	booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
	month={july},
	address={Toronto, Canada},
	year={2023}
	}
	```