--- license: cc-by-sa-3.0 language: - de library_name: flair --- # Flair xLSTM Embeddings (German Wikipedia, Forward) Research & development of Flair xLSTM Embeddings (Forward) trained on [German Wikipedia dump](https://huggingface.co/datasets/gwlms/dewiki-20230701-flair-corpus). The Flair team is currently working on the integration of xLSTM (both LM training and fine-tuning models for downstream tasks). Check out the `xlstm` [branch in the Flair repository](https://github.com/flairNLP/flair/tree/xlstm) - many thanks to [Patrick Haller](https://huggingface.co/PatrickHaller) for the work on it. # Training The current model was trained with commit `18ef331` from the [`xlstm` branch](https://github.com/flairNLP/flair/tree/xlstm). The `xlstm` [library](https://github.com/NX-AI/xlstm) needs to be installed manually - also check that `pip3 install Ninja` is installed. The German Wikipedia dump from [this repository](https://huggingface.co/datasets/gwlms/dewiki-20230701-flair-corpus) is used, including sharding the corpus into a Flair-compatible format: * `valid.txt` -> Validation corpus * `test.txt` -> Test corpus * `train` -> Folder with text files as training corpus The model was trained with the following parameters for 2 epochs: ```python3 import flair import torch from flair.data import SubTokenDictionary from flair.models import xLSTMLanguageModel from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus from transformers import AutoTokenizer flair.device = torch.device('cuda:0') is_forward_lm = True dictionary = SubTokenDictionary.load("gwlms/bert-base-dewiki-v1") corpus = TextCorpus("/home/ubuntu/splitted_corpus", dictionary, is_forward_lm, character_level=False, random_case_flip=True, ) xlstm_ablation_1 = """ mlstm_block: mlstm: conv1d_kernel_size: 2 qkv_proj_blocksize: 2 num_heads: 2 slstm_block: slstm: backend: cuda num_heads: 2 conv1d_kernel_size: 2 bias_init: powerlaw_blockdependent feedforward: proj_factor: 1.3 act_fn: gelu context_length: 256 num_blocks: 7 embedding_dim: 128 slstm_at: [1] """ language_model = xLSTMLanguageModel(dictionary, xlstm_cfg=xlstm_ablation_1, is_forward_lm=True) print(language_model) trainer = LanguageModelTrainer(language_model, corpus) trainer.train("xflair-german-wikipedia-xlstm_ablation_1-bs64-lr5-e2", sequence_length=256, mini_batch_size=64, learning_rate=5, patience=50, max_epochs=2, checkpoint=False, num_workers=4, ) ``` # Caveats Notice: this model integration is heavily under development. And in the process of finding good hyper-parameters. Also downstream experiments are coming very soon.