The GluMLP is not working without flash attention, because the tensors are passed in a different shape. This PR fixes the issue. I also tested it that the embeddings with and without flash attentions are the same.

michael-guenther changed pull request status to open
Jina AI org

LGTM!

michael-guenther changed pull request status to merged

Sign up or log in to comment