File size: 3,097 Bytes
650ed67
 
b273d6b
 
 
 
 
650ed67
b273d6b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: mit
datasets:
- cc100
language:
- en
pipeline_tag: text-generation
---

# GPT-Neo-125M Multi-Exit
Pre-trained language model with identical parameters to [gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m), but with additional language modeling heads ("exits") connected to different layers of the model.

These 6 additional heads (in layers 2, 4, 6, 8, 10, 12) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen.

The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers.

## Usage
Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`).

In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions.

For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting:

```python
from transformers import AutoTokenizer
from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel

# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
multi_exit_config = MultiExitConfiguration(use_original_head=False, 
                                           contrast_layer_indices=(24, 12))
model = AutoMultiExitModel.from_pretrained("IBM/gpt-neo-125m-multiexit", multi_exit_config=multi_exit_config)

# perform text generation as usual
tokenizer = AutoTokenizer.from_pretrained("IBM/gpt-neo-125m-multiexit")
prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
generated_ids = model.generate(**prompt, max_new_tokens=15)
print(tokenizer.batch_decode(generated_ids))
```

## Citation
Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch. 
[The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023.

```bibtex
@inproceedings{gera2023autocontrastive,
  title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
  author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month={july},
  address={Toronto, Canada},
  year={2023}
}
```