julien-c HF staff commited on
Commit
ac2a925
1 Parent(s): 93c01e6

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/patrickvonplaten/longformer2roberta-cnn_dailymail-fp16/README.md

Files changed (1) hide show
  1. README.md +252 -0
README.md ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Longformer2Roberta Summarization with 🤗 EncoderDecoder Framework
2
+
3
+ This model is a Longformer2Roberta model fine-tuned on summarization.
4
+
5
+ Longformer2Roberta is a `EncoderDecoderModel`, meaning that both the encoder is a `allenai/longformer-base-4096` model and the decoder is a `roberta-base` model. Leveraging the [EncoderDecoderFramework](https://huggingface.co/transformers/model_doc/encoderdecoder.html#encoder-decoder-models), the
6
+ two pretrained models can simply be loaded into the framework via:
7
+
8
+ ```python
9
+ roberta2roberta = EncoderDecoderModel.from_encoder_decoder_pretrained("allenai/longformer-base-4096", "roberta-base")
10
+ ```
11
+
12
+ The decoder of an `EncoderDecoder` model needs cross-attention layers and usually makes use of causal
13
+ masking for auto-regressiv generation.
14
+ Thus, ``longformer2roberta`` is consequently fined-tuned on the `CNN/Daily Mail`dataset and the resulting model
15
+ `longformer2roberta-cnn_dailymail-fp16` is uploaded here.
16
+
17
+ ## Example
18
+
19
+ The model is by no means a state-of-the-art model, but nevertheless
20
+ produces reasonable summarization results. It was mainly fine-tuned
21
+ as a proof-of-concept for the 🤗 EncoderDecoder Framework.
22
+
23
+ The model can be used as follows:
24
+
25
+ ```python
26
+ from transformers import LongformerTokenizer, EncoderDecoderModel
27
+
28
+ model = EncoderDecoderModel.from_pretrained("patrickvonplaten/longformer2roberta-cnn_dailymail-fp16")
29
+ tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")
30
+
31
+ article = """(CNN)James Holmes made his introduction to the world in a Colorado cinema filled with spectators watching a midnight showing of the new Batman movie, "The Dark Knight Rises," in June 2012. The moment became one of the deadliest shootings in U.S. history. Holmes is accused of opening fire on the crowd, killing 12 people and injuring or maiming 70 others in Aurora, a suburb of Denver. Holmes appeared like a comic book character: He resembled the Joker, with red-orange hair, similar to the late actor Heath Ledger\'s portrayal of the villain in an earlier Batman movie, authorities said. But Holmes was hardly a cartoon. Authorities said he wore body armor and carried several guns, including an AR-15 rifle, with lots of ammo. He also wore a gas mask. Holmes says he was insane at the time of the shootings, and that is his legal defense and court plea: not guilty by reason of insanity. Prosecutors aren\'t swayed and will seek the death penalty. Opening statements in his trial are scheduled to begin Monday. Holmes admits to the shootings but says he was suffering "a psychotic episode" at the time, according to court papers filed in July 2013 by the state public defenders, Daniel King and Tamara A. Brady. Evidence "revealed thus far in the case supports the defense\'s position that Mr. Holmes suffers from a severe mental illness and was in the throes of a psychotic episode when he committed the acts that resulted in the tragic loss of life and injuries sustained by moviegoers on July 20, 2012," the public defenders wrote. Holmes no longer looks like a dazed Joker, as he did in his first appearance before a judge in 2012. He appeared dramatically different in January when jury selection began for his trial: 9,000 potential jurors were summoned for duty, described as one of the nation\'s largest jury calls. Holmes now has a cleaner look, with a mustache, button-down shirt and khaki pants. In January, he had a beard and eyeglasses. If this new image sounds like one of an academician, it may be because Holmes, now 27, once was one. Just before the shooting, Holmes was a doctoral student in neuroscience, and he was studying how the brain works, with his schooling funded by a U.S. government grant. Yet for all his learning, Holmes apparently lacked the capacity to command his own mind, according to the case against him. A jury will ultimately decide Holmes\' fate. That panel is made up of 12 jurors and 12 alternates. They are 19 women and five men, and almost all are white and middle-aged. The trial could last until autumn. When jury summonses were issued in January, each potential juror stood a 0.2% chance of being selected, District Attorney George Brauchler told the final jury this month. He described the approaching trial as "four to five months of a horrible roller coaster through the worst haunted house you can imagine." The jury will have to render verdicts on each of the 165 counts against Holmes, including murder and attempted murder charges. Meanwhile, victims and their relatives are challenging all media outlets "to stop the gratuitous use of the name and likeness of mass killers, thereby depriving violent individuals the media celebrity and media spotlight they so crave," the No Notoriety group says. They are joined by victims from eight other mass shootings in recent U.S. history. Raised in central coastal California and in San Diego, James Eagan Holmes is the son of a mathematician father noted for his work at the FICO firm that provides credit scores and a registered nurse mother, according to the U-T San Diego newspaper. Holmes also has a sister, Chris, a musician, who\'s five years younger, the newspaper said. His childhood classmates remember him as a clean-cut, bespectacled boy with an "exemplary" character who "never gave any trouble, and never got in trouble himself," The Salinas Californian reported. His family then moved down the California coast, where Holmes grew up in the San Diego-area neighborhood of Rancho Peñasquitos, which a neighbor described as "kind of like Mayberry," the San Diego newspaper said. Holmes attended Westview High School, which says its school district sits in "a primarily middle- to upper-middle-income residential community." There, Holmes ran cross-country, played soccer and later worked at a biotechnology internship at the Salk Institute and Miramar College, which attracts academically talented students. By then, his peers described him as standoffish and a bit of a wiseacre, the San Diego newspaper said. Holmes attended college fairly close to home, in a neighboring area known as Southern California\'s "inland empire" because it\'s more than an hour\'s drive from the coast, in a warm, low-desert climate. He entered the University of California, Riverside, in 2006 as a scholarship student. In 2008 he was a summer camp counselor for disadvantaged children, age 7 to 14, at Camp Max Straus, run by Jewish Big Brothers Big Sisters of Los Angeles. He graduated from UC Riverside in 2010 with the highest honors and a bachelor\'s degree in neuroscience. "Academically, he was at the top of the top," Chancellor Timothy P. White said. He seemed destined for even higher achievement. By 2011, he had enrolled as a doctoral student in the neuroscience program at the University of Colorado Anschutz Medical Campus in Aurora, the largest academic health center in the Rocky Mountain region. The doctoral in neuroscience program attended by Holmes focuses on how the brain works, with an emphasis on processing of information, behavior, learning and memory. Holmes was one of six pre-thesis Ph.D. students in the program who were awarded a neuroscience training grant from the National Institutes of Health. The grant rewards outstanding neuroscientists who will make major contributions to neurobiology. A syllabus that listed Holmes as a student at the medical school shows he was to have delivered a presentation about microRNA biomarkers. But Holmes struggled, and his own mental health took an ominous turn. In March 2012, he told a classmate he wanted to kill people, and that he would do so "when his life was over," court documents said. Holmes was "denied access to the school after June 12, 2012, after he made threats to a professor," according to court documents. About that time, Holmes was a patient of University of Colorado psychiatrist Lynne Fenton. Fenton was so concerned about Holmes\' behavior that she mentioned it to her colleagues, saying he could be a danger to others, CNN affiliate KMGH-TV reported, citing sources with knowledge of the investigation. Fenton\'s concerns surfaced in early June, sources told the Denver station. Holmes began to fantasize about killing "a lot of people" in early June, nearly six weeks before the shootings, the station reported, citing unidentified sources familiar with the investigation. Holmes\' psychiatrist contacted several members of a "behavioral evaluation and threat assessment" team to say Holmes could be a danger to others, the station reported. At issue was whether to order Holmes held for 72 hours to be evaluated by mental health professionals, the station reported. "Fenton made initial phone calls about engaging the BETA team" in "the first 10 days" of June, but it "never came together" because in the period Fenton was having conversations with team members, Holmes began the process of dropping out of school, a source told KMGH. Defense attorneys have rejected the prosecution\'s assertions that Holmes was barred from campus. Citing statements from the university, Holmes\' attorneys have argued that his access was revoked because that\'s normal procedure when a student drops enrollment. What caused this turn for the worse for Holmes has yet to be clearly detailed. In the months before the shooting, he bought four weapons and more than 6,000 rounds of ammunition, authorities said. Police said he also booby-trapped his third-floor apartment with explosives, but police weren\'t fooled. After Holmes was caught in the cinema parking lot immediately after the shooting, bomb technicians went to the apartment and neutralized the explosives. No one was injured at the apartment building. Nine minutes before Holmes went into the movie theater, he called a University of Colorado switchboard, public defender Brady has said in court. The number he called can be used to get in contact with faculty members during off hours, Brady said. Court documents have also revealed that investigators have obtained text messages that Holmes exchanged with someone before the shooting. That person was not named, and the content of the texts has not been made public. According to The New York Times, Holmes sent a text message to a fellow graduate student, a woman, about two weeks before the shooting. She asked if he had left Aurora yet, reported the newspaper, which didn\'t identify her. No, he had two months left on his lease, Holmes wrote back, according to the Times. He asked if she had heard of "dysphoric mania," a form of bipolar disorder marked by the highs of mania and the dark and sometimes paranoid delusions of major depression. The woman asked if the disorder could be managed with treatment. "It was," Holmes wrote her, according to the Times. But he warned she should stay away from him "because I am bad news," the newspaper reported. It was her last contact with Holmes. After the shooting, Holmes\' family issued a brief statement: "Our hearts go out to those who were involved in this tragedy and to the families and friends of those involved," they said, without giving any information about their son. Since then, prosecutors have refused to offer a plea deal to Holmes. For Holmes, "justice is death," said Brauchler, the district attorney. In December, Holmes\' parents, who will be attending the trial, issued another statement: They asked that their son\'s life be spared and that he be sent to an institution for mentally ill people for the rest of his life, if he\'s found not guilty by reason of insanity. "He is not a monster," Robert and Arlene Holmes wrote, saying the death penalty is "morally wrong, especially when the condemned is mentally ill." "He is a human being gripped by a severe mental illness," the parents said. The matter will be settled by the jury. CNN\'s Ana Cabrera and Sara Weisfeldt contributed to this report from Denver."""
32
+
33
+ input_ids = tokenizer(article, return_tensors="pt").input_ids
34
+ output_ids = model.generate(input_ids)
35
+
36
+ print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
37
+ # should produce
38
+ # James Holmes, 27, is accused of opening fire on a Colorado theater.
39
+ # He was a doctoral student at University of Colorado.
40
+ # Holmes says he was suffering "a psychotic episode" at the time of the shooting.
41
+ # Prosecutors won't say whether Holmes was barred from campus.
42
+ ```
43
+
44
+ Such an article has a length of > 2000 tokens, which means that it cannot be handled correctly by Bert or Roberta encoders.
45
+
46
+ ## Training script:
47
+
48
+ **IMPORTANT**: In order for this code to work, make sure you checkout to the branch
49
+ [more_general_trainer_metric](https://github.com/huggingface/transformers/tree/more_general_trainer_metric), which slightly adapts
50
+ the `Trainer` for `EncoderDecoderModels` according to this PR: https://github.com/huggingface/transformers/pull/5840.
51
+
52
+ The following code shows the complete training script that was used to fine-tune `longformer2roberta-cnn_dailymail-fp16
53
+ ` for reproducability. The training last ~90h on a standard GPU.
54
+
55
+ ```python
56
+ #!/usr/bin/env python3
57
+ import nlp
58
+ import logging
59
+ from transformers import LongformerTokenizer, EncoderDecoderModel, Trainer, TrainingArguments
60
+
61
+ logging.basicConfig(level=logging.INFO)
62
+
63
+ model = EncoderDecoderModel.from_encoder_decoder_pretrained("allenai/longformer-base-4096", "roberta-base")
64
+ tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")
65
+
66
+ # load train and validation data
67
+ train_dataset = nlp.load_dataset("cnn_dailymail", "3.0.0", split="train")
68
+ val_dataset = nlp.load_dataset("cnn_dailymail", "3.0.0", split="validation[:5%]")
69
+
70
+ # load rouge for validation
71
+ rouge = nlp.load_metric("rouge", experiment_id=0)
72
+
73
+ # enable gradient checkpointing for longformer encoder
74
+ model.encoder.config.gradient_checkpointing = True
75
+
76
+ # set decoding params
77
+ model.config.decoder_start_token_id = tokenizer.bos_token_id
78
+ model.config.eos_token_id = tokenizer.eos_token_id
79
+ model.config.max_length = 142
80
+ model.config.min_length = 56
81
+ model.config.no_repeat_ngram_size = 3
82
+ model.early_stopping = True
83
+ model.length_penalty = 2.0
84
+ model.num_beams = 4
85
+
86
+ encoder_length = 2048
87
+ decoder_length = 128
88
+ batch_size = 16
89
+
90
+
91
+ # map data correctly
92
+ def map_to_encoder_decoder_inputs(batch):
93
+ # Tokenizer will automatically set [BOS] <text> [EOS]
94
+ # cut off at Longformer at 2048
95
+ inputs = tokenizer(batch["article"], padding="max_length", truncation=True, max_length=encoder_length)
96
+ # force summarization <= 128
97
+ outputs = tokenizer(batch["highlights"], padding="max_length", truncation=True, max_length=decoder_length)
98
+
99
+ batch["input_ids"] = inputs.input_ids
100
+ batch["attention_mask"] = inputs.attention_mask
101
+
102
+ # set 128 tokens to global attention
103
+ batch["global_attention_mask"] = [[1 if i < 128 else 0 for i in range(sequence_length)] for sequence_length in len(inputs.input_ids) * [encoder_length]]
104
+ batch["decoder_input_ids"] = outputs.input_ids
105
+ batch["labels"] = outputs.input_ids.copy()
106
+ # mask loss for padding
107
+ batch["labels"] = [
108
+ [-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in batch["labels"]
109
+ ]
110
+ batch["decoder_attention_mask"] = outputs.attention_mask
111
+
112
+ assert all([len(x) == encoder_length for x in inputs.input_ids])
113
+ assert all([len(x) == decoder_length for x in outputs.input_ids])
114
+
115
+ return batch
116
+
117
+
118
+ def compute_metrics(pred):
119
+ labels_ids = pred.label_ids
120
+ pred_ids = pred.predictions
121
+
122
+ # all unnecessary tokens are removed
123
+ pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
124
+ labels_ids[labels_ids == -100] = tokenizer.eos_token_id
125
+ label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)
126
+
127
+ rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=["rouge2"])["rouge2"].mid
128
+
129
+ return {
130
+ "rouge2_precision": round(rouge_output.precision, 4),
131
+ "rouge2_recall": round(rouge_output.recall, 4),
132
+ "rouge2_fmeasure": round(rouge_output.fmeasure, 4),
133
+ }
134
+
135
+
136
+ return {
137
+ "rouge2_precision": round(rouge_output.precision, 4),
138
+ "rouge2_recall": round(rouge_output.recall, 4),
139
+ "rouge2_fmeasure": round(rouge_output.fmeasure, 4),
140
+ }
141
+
142
+
143
+ # make train dataset ready
144
+ train_dataset = train_dataset.map(
145
+ map_to_encoder_decoder_inputs, batched=True, batch_size=batch_size, remove_columns=["article", "highlights"],
146
+ )
147
+ train_dataset.set_format(
148
+ type="torch", columns=["input_ids", "attention_mask", "global_attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],
149
+ )
150
+
151
+ # same for validation dataset
152
+ val_dataset = val_dataset.map(
153
+ map_to_encoder_decoder_inputs, batched=True, batch_size=batch_size, remove_columns=["article", "highlights"],
154
+ )
155
+ val_dataset.set_format(
156
+ type="torch", columns=["input_ids", "global_attention_mask", "attention_mask", "decoder_input_ids", "decoder_attention_mask", "labels"],
157
+ )
158
+
159
+ # set training arguments - these params are not really tuned, feel free to change
160
+ training_args = TrainingArguments(
161
+ output_dir="./",
162
+ per_device_train_batch_size=batch_size,
163
+ per_device_eval_batch_size=batch_size,
164
+ predict_from_generate=True,
165
+ evaluate_during_training=True,
166
+ do_train=True,
167
+ do_eval=True,
168
+ logging_steps=1000,
169
+ save_steps=1000,
170
+ eval_steps=1000,
171
+ overwrite_output_dir=True,
172
+ warmup_steps=2000,
173
+ save_total_limit=3,
174
+ fp16=True,
175
+ )
176
+
177
+ # instantiate trainer
178
+ trainer = Trainer(
179
+ model=model,
180
+ args=training_args,
181
+ compute_metrics=compute_metrics,
182
+ train_dataset=train_dataset,
183
+ eval_dataset=val_dataset,
184
+ )
185
+
186
+ # start training
187
+ trainer.train()
188
+ ```
189
+
190
+ ## Evaluation
191
+
192
+ The following script evaluates the model on the test set of
193
+ CNN/Daily Mail.
194
+
195
+ ```python
196
+ #!/usr/bin/env python3
197
+ import nlp
198
+ import torch
199
+ from transformers import LongformerTokenizer, EncoderDecoderModel
200
+
201
+ tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")
202
+ model = EncoderDecoderModel.from_pretrained("patrickvonplaten/longformer2roberta-cnn_dailymail-fp16")
203
+ model.to("cuda")
204
+
205
+ test_dataset = nlp.load_dataset("cnn_dailymail", "3.0.0", split="test")
206
+ batch_size = 32
207
+
208
+ encoder_length = 2048
209
+ decoder_length = 128
210
+
211
+
212
+ # map data correctly
213
+ def generate_summary(batch):
214
+ # Tokenizer will automatically set [BOS] <text> [EOS]
215
+ # cut off at BERT max length 512
216
+ inputs = tokenizer(batch["article"], padding="max_length", truncation=True, max_length=encoder_length, return_tensors="pt")
217
+ input_ids = inputs.input_ids.to("cuda")
218
+ attention_mask = inputs.attention_mask.to("cuda")
219
+ global_attention_mask = torch.zeros_like(attention_mask)
220
+ global_attention_mask[:, :decoder_length] = 1
221
+
222
+ outputs = model.generate(input_ids, attention_mask=attention_mask, global_attention_mask=global_attention_mask)
223
+
224
+ # all special tokens including will be removed
225
+ output_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)
226
+
227
+ batch["pred"] = output_str
228
+
229
+ return batch
230
+
231
+
232
+ results = test_dataset.map(generate_summary, batched=True, batch_size=batch_size, remove_columns=["article"])
233
+
234
+ # load rouge for validation
235
+ rouge = nlp.load_metric("rouge")
236
+
237
+ pred_str = results["pred"]
238
+ label_str = results["highlights"]
239
+
240
+ rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=["rouge2"])["rouge2"].mid
241
+
242
+ print(rouge_output)
243
+ ```
244
+
245
+ The obtained results should be:
246
+
247
+ | - | Rouge2 - mid -precision | Rouge2 - mid - recall | Rouge2 - mid - fmeasure |
248
+ |----------|:-------------:|:------:|:------:|
249
+ | **CNN/Daily Mail** | 12.39 | 15.05 | **13.21** |
250
+
251
+ **Note** This model was trained to show how Longformer can be used as an Encoder model in a EncoderDecoder setup.
252
+ Better results are obtained for datasets of much longer inputs.