nvidia
/

NV-Embed-v2

Feature Extraction

sentence-transformers

Model card Files Files and versions Community

Bidirectional or Casual?

#11

by AlignLearner - opened Sep 23, 2024

Sep 23, 2024

class BidirectionalMistralModel(MistralModel):
    config_class = BidirectionalMistralConfig
    
    def __init__(self, config: MistralConfig):
        super().__init__(config)
        for layer in self.layers:
            layer.self_attn.is_causal = False
        self._attn_implementation = "eager"

However, MistralAttention doesn't use is_causal.

Sep 23, 2024

MistralFlashAttention2 uses layer.self_attn.is_causal.
MistralSdpaAttention doesn't use layer.self_attn.is_causal.

nada5

NVIDIA org Sep 23, 2024

Hi, @AlignLearner . Please refer this for Sdpa attention use is_causal: "https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/mistral/modeling_mistral.py#L692".

nada5 changed discussion status to closed Sep 24, 2024

Sep 24, 2024

•

edited Sep 24, 2024

@nada5 However, In v4.44.2, Sdpa attention don't use is_causal
https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/mistral/modeling_mistral.py#L475C9-L484C10

Sep 24, 2024

Following https://huggingface.co/nvidia/NV-Embed-v2#2-required-packages
And in v4.42.4, Sdpa attention don't use is_causal
https://github.com/huggingface/transformers/blob/v4.42.4/src/transformers/models/mistral/modeling_mistral.py#L645C19-L655C1

nada5

NVIDIA org Sep 24, 2024

Hi, @AlignLearner . In fact, NV-Embed adopt the eager mode that does not use spda attention.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment