## CUDA ops We implement common CUDA ops used in detection, segmentation, etc. - AssignScoreWithK - BallQuery - BBoxOverlaps - CARAFE - CrissCrossAttention - ContextBlock - CornerPool - Deformable Convolution v1/v2 - Deformable RoIPool - GatherPoints - FurthestPointSample - FurthestPointSampleWithDist - GeneralizedAttention - KNN - MaskedConv - NMS - PSAMask - RoIPool - RoIAlign - SimpleRoIAlign - SigmoidFocalLoss - SoftmaxFocalLoss - SoftNMS - Synchronized BatchNorm - ThreeInterpolate - ThreeNN - Weight standardization - Correlation