File size: 3,241 Bytes
16525d5
dda67d9
42dbd68
 
 
080713b
 
 
 
42dbd68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f6720a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
This model provides a GPT-2 language model trained with SimCTG on the Wikitext-103 benchmark [(Merity et al., 2016)](https://arxiv.org/abs/1609.07843) based on our paper [_A Contrastive Framework for Neural Text Generation_](https://arxiv.org/abs/2202.06417).

We provide a detailed tutorial on how to apply SimCTG and Contrastive Search in our [project repo](https://github.com/yxuansu/SimCTG#4-huggingface-style-tutorials-back-to-top). In the following, we illustrate a brief tutorial on how to use our approach to perform text generation.

## 1. Installation of SimCTG:
```yaml
pip install simctg --upgrade
```

## 2. Initialize SimCTG Model:
```python
import torch
# load SimCTG language model
from simctg.simctggpt import SimCTGGPT
model_name = r'cambridgeltl/simctg_wikitext103'
model = SimCTGGPT(model_name)
model.eval()
tokenizer = model.tokenizer
```

## 3. Prepare the Text Prefix:
```python
prefix_text = r"Butt criticized Donald 's controls in certain situations in the game , as well as the difficulty of some levels and puzzles . 
Buchanan also criticized the controls , calling"
print ('Prefix is: {}'.format(prefix_text))
tokens = tokenizer.tokenize(prefix_text)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = torch.LongTensor(input_ids).view(1,-1)
```

## 4. Generate Text with Contrastive Search:
```python
beam_width, alpha, decoding_len = 8, 0.6, 128
output = model.fast_contrastive_search(input_ids=input_ids, beam_width=beam_width, 
                                       alpha=alpha, decoding_len=decoding_len)    
                                           
print("Output:\n" + 100 * '-')
print(tokenizer.decode(output))
'''
  Prefix is: Butt criticized Donald 's controls in certain situations in the game , as well as the difficulty of some levels and puzzles . 
             Buchanan also criticized the controls , calling
  Output:
  ----------------------------------------------------------------------------------------------------
  Butt criticized Donald's controls in certain situations in the game, as well as the difficulty of some levels and puzzles. Buchanan also 
  criticized the controls, calling them " unimpressive " and a " nightmare " of an experience to play with players unfamiliar with Tetris. 
  On the other hand, his opinion was shared by other reviewers, and some were critical of the game's technical design for the Wii version 
  of Tetris. In addition, Tintin's review included a quote from Roger Ebert, who said that Tetris was better than the original game due to 
  its simplicity and ease of play. Ebert's comments were included in the game's DVD commentary, released on March 22, 2010. It is unclear 
  if any of the video commentary was taken from the DVD
'''
```

For more details of our work, please refer to our main [project repo](https://github.com/yxuansu/SimCTG).

## 5. Citation:
If you find our paper and resources useful, please kindly leave a star and cite our paper. Thanks!

```bibtex
@article{su2022contrastive,
  title={A Contrastive Framework for Neural Text Generation},
  author={Su, Yixuan and Lan, Tian and Wang, Yan and Yogatama, Dani and Kong, Lingpeng and Collier, Nigel},
  journal={arXiv preprint arXiv:2202.06417},
  year={2022}
}
```