Adds support for MQA/GQA and attention mask during training / fine-tuning. 371fd51 gugarosa commited on Oct 30, 2023
fix(phi-1): Checks length of `attention_mask`if it is passed as direct tensor. 1f890f7 gugarosa commited on Sep 26, 2023