Update on "fixing image_encoder to work with cuda_graphs"

Summary: the combination of tensors on multiple devices in get_rel_pos was preventing cuda graphs from correctly optimizing things Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
2025-06-03 14:59:27 +08:00 · 2023-05-30 20:37:47 +00:00 · 2023-05-30 20:37:47 +00:00 · 51bc7a2e0b
commit 51bc7a2e0b
parent 9aacb82524
1 changed files with 2 additions and 2 deletions
--- a/segment_anything/modeling/image_encoder.py
+++ b/segment_anything/modeling/image_encoder.py
@ -315,8 +315,8 @@ def get_rel_pos(q_size: int, k_size: int, rel_pos: torch.Tensor) -> torch.Tensor
        rel_pos_resized = rel_pos

    # Scale the coords with short length if shapes for q and k are different.
-    q_coords = (torch.arange(q_size).to(rel_pos.device)[:, None] * max(k_size / q_size, 1.0))
-    k_coords = (torch.arange(k_size).to(rel_pos.device)[None, :] * max(q_size / k_size, 1.0))
+    q_coords = (torch.arange(q_size, device=rel_pos.device)[:, None] * max(k_size / q_size, 1.0))
+    k_coords = (torch.arange(k_size, device=rel_pos.device)[None, :] * max(q_size / k_size, 1.0))
    relative_coords = (q_coords - k_coords) + (k_size - 1) * max(q_size / k_size, 1.0)

    return rel_pos_resized[relative_coords.long()]