File size: 3,163 Bytes
98bc749 e380049 98bc749 c03980e adb08df e380049 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
tags:
- longformer
language: multilingual
license: apache-2.0
datasets:
- wikitext
pipeline_tag: feature-extraction
---
## XLM-R Longformer Model / XLM-Long
XLM-R Longformer (or XLM-Long for short) is a XLM-R model that has been extended to allow sequence lengths up to 4096 tokens, instead of the regular 512. The model was pre-trained from the XLM-RoBERTa checkpoint using the Longformer [pre-training scheme](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) on the English WikiText-103 corpus.
The reason for this was to investigate methods for creating efficient Transformers for low-resource languages, such as Swedish, without the need to pre-train them on long-context datasets in each respecitve language. The trained model came as a result of a master thesis project at [Peltarion](https://peltarion.com/) and was fine-tuned on multilingual quesion-answering tasks, with code available [here](https://github.com/MarkusSagen/Master-Thesis-Multilingual-Longformer#xlm-r).
Since both XLM-R model and Longformer models are large models, it it recommended to run the models with NVIDIA Apex (16bit precision), large GPU and several gradient accumulation steps.
## How to Use
The model can be used as expected to fine-tune on a downstream task.
For instance for QA.
```python
import torch
from transformers import AutoModel, AutoTokenizer
MAX_SEQUENCE_LENGTH = 4096
MODEL_NAME_OR_PATH = "markussagen/xlm-roberta-longformer-base-4096"
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME_OR_PATH,
max_length=MAX_SEQUENCE_LENGTH,
padding="max_length",
truncation=True,
)
model = AutoModelForQuestionAnswering.from_pretrained(
MODEL_NAME_OR_PATH,
max_length=MAX_SEQUENCE_LENGTH,
)
```
## Training Procedure
The model have been trained on the WikiText-103 corpus, using a **48GB** GPU with the following training script and parameters. The model was pre-trained for 6000 iterations and took ~5 days. See the full [training script](https://github.com/MarkusSagen/Master-Thesis-Multilingual-Longformer/blob/main/scripts/finetune_qa_models.py) and [Github repo](https://github.com/MarkusSagen/Master-Thesis-Multilingual-Longformer) for more information
```sh
wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
unzip wikitext-103-raw-v1.zip
export DATA_DIR=./wikitext-103-raw
scripts/run_long_lm.py \
--model_name_or_path xlm-roberta-base \
--model_name xlm-roberta-to-longformer \
--output_dir ./output \
--logging_dir ./logs \
--val_file_path $DATA_DIR/wiki.valid.raw \
--train_file_path $DATA_DIR/wiki.train.raw \
--seed 42 \
--max_pos 4096 \
--adam_epsilon 1e-8 \
--warmup_steps 500 \
--learning_rate 3e-5 \
--weight_decay 0.01 \
--max_steps 6000 \
--evaluate_during_training \
--logging_steps 50 \
--eval_steps 50 \
--save_steps 6000 \
--max_grad_norm 1.0 \
--per_device_eval_batch_size 2 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 64 \
--overwrite_output_dir \
--fp16 \
--do_train \
--do_eval
``` |