ibm
/

gpt-neo-125m-multiexit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

gpt-neo-125m-multiexit / README.md

arielgera's picture

Update README.md

b273d6b over 1 year ago

|

history blame contribute delete

3.1 kB

	---
	license: mit
	datasets:
	- cc100
	language:
	- en
	pipeline_tag: text-generation
	---

	# GPT-Neo-125M Multi-Exit
	Pre-trained language model with identical parameters to [gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m), but with additional language modeling heads ("exits") connected to different layers of the model.

	These 6 additional heads (in layers 2, 4, 6, 8, 10, 12) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen.

	The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers.

	## Usage
	Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`).

	In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions.

	For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting:

	```python
	from transformers import AutoTokenizer
	from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
	from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel

	# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
	multi_exit_config = MultiExitConfiguration(use_original_head=False,
	contrast_layer_indices=(24, 12))
	model = AutoMultiExitModel.from_pretrained("IBM/gpt-neo-125m-multiexit", multi_exit_config=multi_exit_config)

	# perform text generation as usual
	tokenizer = AutoTokenizer.from_pretrained("IBM/gpt-neo-125m-multiexit")
	prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
	generated_ids = model.generate(**prompt, max_new_tokens=15)
	print(tokenizer.batch_decode(generated_ids))
	```

	## Citation
	Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch.
	[The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023.

	```bibtex
	@inproceedings{gera2023autocontrastive,
	title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
	author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
	booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
	month={july},
	address={Toronto, Canada},
	year={2023}
	}
	```