arielgera commited on
Commit
b273d6b
·
1 Parent(s): 28e712d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md CHANGED
@@ -1,3 +1,54 @@
1
  ---
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - cc100
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
  ---
9
+
10
+ # GPT-Neo-125M Multi-Exit
11
+ Pre-trained language model with identical parameters to [gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m), but with additional language modeling heads ("exits") connected to different layers of the model.
12
+
13
+ These 6 additional heads (in layers 2, 4, 6, 8, 10, 12) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen.
14
+
15
+ The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers.
16
+
17
+ ## Usage
18
+ Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`).
19
+
20
+ In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions.
21
+
22
+ For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting:
23
+
24
+ ```python
25
+ from transformers import AutoTokenizer
26
+ from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
27
+ from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel
28
+
29
+ # initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
30
+ multi_exit_config = MultiExitConfiguration(use_original_head=False,
31
+ contrast_layer_indices=(24, 12))
32
+ model = AutoMultiExitModel.from_pretrained("IBM/gpt-neo-125m-multiexit", multi_exit_config=multi_exit_config)
33
+
34
+ # perform text generation as usual
35
+ tokenizer = AutoTokenizer.from_pretrained("IBM/gpt-neo-125m-multiexit")
36
+ prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
37
+ generated_ids = model.generate(**prompt, max_new_tokens=15)
38
+ print(tokenizer.batch_decode(generated_ids))
39
+ ```
40
+
41
+ ## Citation
42
+ Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch.
43
+ [The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023.
44
+
45
+ ```bibtex
46
+ @inproceedings{gera2023autocontrastive,
47
+ title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
48
+ author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
49
+ booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
50
+ month={july},
51
+ address={Toronto, Canada},
52
+ year={2023}
53
+ }
54
+ ```