9B - query_pre_attn_scalar = 256 not 224
#22
by
danielhanchen
- opened
See https://github.com/google/gemma_pytorch/commit/03e657582d17cb5a8617ebf333c1c16f3694670e
Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)
osanseviero
changed pull request status to
merged