eva.py: fixed bug in applying attention mask

The mask should be applied before the softmax.
pull/2236/head
Feraidoon Mehri 2024-07-19 15:12:04 +03:30 committed by GitHub
parent 7160af4a24
commit 4cca568bd8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 3 additions and 1 deletions

View File

@ -134,10 +134,12 @@ class EvaAttention(nn.Module):
else:
q = q * self.scale
attn = (q @ k.transpose(-2, -1))
attn = attn.softmax(dim=-1)
if attn_mask is not None:
attn_mask = attn_mask.to(torch.bool)
attn = attn.masked_fill(~attn_mask[:, None, None, :], float("-inf"))
attn = attn.softmax(dim=-1)
attn = self.attn_drop(attn)
x = attn @ v