TristanBehrens
/

bach-garland-phariaplus

Model card Files Files and versions Community

bach-garland-phariaplus / README.md

TristanBehrens's picture

Upload README.md with huggingface_hub

7c44227 verified 3 months ago

|

1.39 kB

	---
	language:
	- en
	tags:
	- NLP
	license: mit
	datasets:
	- TristanBehrens/bach_garland_2024-100K
	base_model: None
	---

	# bach_garland_phariaplus - An xLSTM Model

	![Trained with Helibrunna](banner.jpg)

	Trained with [Helibrunna](https://github.com/AI-Guru/helibrunna) by [Dr. Tristan Behrens](https://de.linkedin.com/in/dr-tristan-behrens-734967a2).

	## Configuration

	```
	training:
	model_name: bach_garland_phariaplus
	batch_size: 22
	lr: 0.001
	lr_warmup_steps: 1818
	lr_decay_until_steps: 18181
	lr_decay_factor: 0.001
	weight_decay: 0.1
	amp_precision: bfloat16
	weight_precision: float32
	enable_mixed_precision: true
	num_epochs: 8
	output_dir: output/bach_garland_phariaplus
	save_every_step: 500
	log_every_step: 10
	wandb_project: bach_garland
	torch_compile: false
	model:
	type: pharia
	attention_bias: true
	attention_dropout: 0.0
	eos_token_id: 0
	bos_token_id: 127179
	pad_token_id: 1
	hidden_act: gelu
	hidden_size: 132
	initializer_range: 0.02
	intermediate_size: 264
	max_position_embeddings: 2048
	mlp_bias: true
	num_attention_heads: 6
	num_hidden_layers: 6
	num_key_value_heads: 6
	rope_scaling: null
	rope_theta: 1000000
	tie_word_embeddings: false
	use_cache: true
	context_length: 2048
	vocab_size: 178
	dataset:
	hugging_face_id: TristanBehrens/bach_garland_2024-100K
	tokenizer:
	type: whitespace
	fill_token: '[EOS]'

	```