eva.py: fixed bug in applying attention mask

The mask should be applied before the softmax.
2025-06-03 15:01:08 +08:00 · 2024-07-19 15:12:04 +03:30 · 2024-07-19 15:12:04 +03:30 · 4cca568bd8
commit 4cca568bd8
parent 7160af4a24
1 changed files with 3 additions and 1 deletions
--- a/timm/models/eva.py
+++ b/timm/models/eva.py
@ -134,10 +134,12 @@ class EvaAttention(nn.Module):
        else:
            q = q * self.scale
            attn = (q @ k.transpose(-2, -1))
-            attn = attn.softmax(dim=-1)
+            
            if attn_mask is not None:
                attn_mask = attn_mask.to(torch.bool)
                attn = attn.masked_fill(~attn_mask[:, None, None, :], float("-inf"))
            attn = attn.softmax(dim=-1)
            attn = self.attn_drop(attn)
            x = attn @ v