--- language: - en tags: - NLP license: mit datasets: - TristanBehrens/bach_garland_2024-100K base_model: None --- # bach_garland_phariaplus - An xLSTM Model ![Trained with Helibrunna](banner.jpg) Trained with [Helibrunna](https://github.com/AI-Guru/helibrunna) by [Dr. Tristan Behrens](https://de.linkedin.com/in/dr-tristan-behrens-734967a2). ## Configuration ``` training: model_name: bach_garland_phariaplus batch_size: 22 lr: 0.001 lr_warmup_steps: 1818 lr_decay_until_steps: 18181 lr_decay_factor: 0.001 weight_decay: 0.1 amp_precision: bfloat16 weight_precision: float32 enable_mixed_precision: true num_epochs: 8 output_dir: output/bach_garland_phariaplus save_every_step: 500 log_every_step: 10 wandb_project: bach_garland torch_compile: false model: type: pharia attention_bias: true attention_dropout: 0.0 eos_token_id: 0 bos_token_id: 127179 pad_token_id: 1 hidden_act: gelu hidden_size: 132 initializer_range: 0.02 intermediate_size: 264 max_position_embeddings: 2048 mlp_bias: true num_attention_heads: 6 num_hidden_layers: 6 num_key_value_heads: 6 rope_scaling: null rope_theta: 1000000 tie_word_embeddings: false use_cache: true context_length: 2048 vocab_size: 178 dataset: hugging_face_id: TristanBehrens/bach_garland_2024-100K tokenizer: type: whitespace fill_token: '[EOS]' ```