File size: 2,912 Bytes
2181606
 
056256b
 
 
 
 
2181606
056256b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: cc-by-nc-sa-4.0
datasets:
- NorGLM/NO-CNN-DailyMail
language:
- 'no'
pipeline_tag: summarization
---

# Model Card

NorGPT-3B-summarization-peft is trained on top of [NorGPT-3B](https://huggingface.co/NorGLM/NorGPT-3B) model using RLHF strategy on [NO-CNN-DailyMail](https://huggingface.co/datasets/NorGLM/NO-CNN-DailyMail) dataset.

Different from step 2 in the original RLHF, we trained the reward model by estimating the semantic similarity between the candidate generated text and the human annotated summary (golden summary) using the [NorBERT](https://huggingface.co/ltg/norbert) model. Generated summaries with higher cosine similarity to the golden summary will be ranked higher in the training of the reward model.

Prompt format:
```
Summarise the article:\\n{article} |||\\n{positive_sample}
```

Inference prompt:
```
Summarise the article:\\n{article} |||\\n
```

## Training Split
We split data to train on step 1-step 3 for RLHF:
|       | #samples |
|-------|---------------------|
| step 1 | 61181              |
| step 2 | 16798              |
| step 3 | 9758               | 

## Run the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "NorGLM/NorGPT-3B-rfhl-summarization"
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map='auto',
    torch_dtype=torch.bfloat16
)
```

## Inference on test set
Load the model to evaluate on the test set of NO-CNN-DailyMail dataset:
```python
def generate_texts(model, tokenizer, prompts, max_seq_length=200, do_sample=True, top_p=0.95, top_k=10):
    # prompts are a list of news articles
    results = []
    cnt = 0
    for prompt in prompts:
        cnt += 1
        pro_len = len(prompt.split())
        if pro_len>1024:
            results.append('')
            continue

        prompt = 'Summarise the article:\\n' + prompt + ' |||\\n'

        model_inputs = tokenizer(prompt, return_tensors='pt').to(torch_device)
        output = model.generate(**model_inputs, do_sample=False, max_new_tokens=max_seq_length)
        result = tokenizer.decode(output[0], skip_special_tokens=True)
        result = result.split("|||\\n")[-1]
        results.append(result)
    return results

print("--LOADING EVAL DATAS---")
eval_data = load_dataset("NorGLM/NO-CNN-DailyMail", data_files="test.csv")
prompts = eval_data['train']['article']
positive_samples = eval_data['train']['positive_sample']

print("--MAKING PREDICTIONS---")
model.eval()

output_file = <output file name>
with torch.no_grad():
    results = generate_texts(model, tokenizer, prompts)

df = pd.DataFrame({'article':prompts, 'generated_text':results, 'positive_sample':positive_samples})

print("Save results to csv file...")
df.to_csv(output_file)

```

## Note
More training details will be released soon!