File size: 2,947 Bytes
609e6b7 70454bb 609e6b7 70454bb 609e6b7 70454bb 609e6b7 70454bb 609e6b7 70454bb 609e6b7 70454bb 609e6b7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
---
license: cc-by-sa-3.0
language:
- de
library_name: flair
---
# Flair xLSTM Embeddings (German Wikipedia, Forward)
Research & development of Flair xLSTM Embeddings (Forward) trained on [German Wikipedia dump](https://huggingface.co/datasets/gwlms/dewiki-20230701-flair-corpus).
The Flair team is currently working on the integration of xLSTM (both LM training and fine-tuning models for downstream tasks).
Check out the `xlstm` [branch in the Flair repository](https://github.com/flairNLP/flair/tree/xlstm) - many thanks to [Patrick Haller](https://huggingface.co/PatrickHaller) for the work on it.
# Training
The current model was trained with commit `18ef331` from the [`xlstm` branch](https://github.com/flairNLP/flair/tree/xlstm). The `xlstm` [library](https://github.com/NX-AI/xlstm) needs to be installed manually - also check that `pip3 install Ninja` is installed.
The German Wikipedia dump from [this repository](https://huggingface.co/datasets/gwlms/dewiki-20230701-flair-corpus) is used, including sharding the corpus into a Flair-compatible format:
* `valid.txt` -> Validation corpus
* `test.txt` -> Test corpus
* `train` -> Folder with text files as training corpus
The model was trained with the following parameters for 2 epochs:
```python3
import flair
import torch
from flair.data import SubTokenDictionary
from flair.models import xLSTMLanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
from transformers import AutoTokenizer
flair.device = torch.device('cuda:0')
is_forward_lm = True
dictionary = SubTokenDictionary.load("gwlms/bert-base-dewiki-v1")
corpus = TextCorpus("/home/ubuntu/splitted_corpus",
dictionary,
is_forward_lm,
character_level=False,
random_case_flip=True,
)
xlstm_ablation_1 = """
mlstm_block:
mlstm:
conv1d_kernel_size: 2
qkv_proj_blocksize: 2
num_heads: 2
slstm_block:
slstm:
backend: cuda
num_heads: 2
conv1d_kernel_size: 2
bias_init: powerlaw_blockdependent
feedforward:
proj_factor: 1.3
act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
language_model = xLSTMLanguageModel(dictionary, xlstm_cfg=xlstm_ablation_1,
is_forward_lm=True)
print(language_model)
trainer = LanguageModelTrainer(language_model, corpus)
trainer.train("xflair-german-wikipedia-xlstm_ablation_1-bs64-lr5-e2",
sequence_length=256,
mini_batch_size=64,
learning_rate=5,
patience=50,
max_epochs=2,
checkpoint=False,
num_workers=4,
)
```
# Caveats
Notice: this model integration is heavily under development. And in the process of finding good hyper-parameters. Also downstream experiments are coming very soon. |