mmcv/docs/tensorrt_custom_ops.md

# TensorRT Custom Ops

<!-- TOC -->

- [TensorRT Custom Ops](#tensorrt-custom-ops)
  - [MMCVRoIAlign](#mmcvroialign)
    - [Description](#description)
    - [Parameters](#parameters)
    - [Inputs](#inputs)
    - [Outputs](#outputs)
    - [Type Constraints](#type-constraints)
  - [ScatterND](#scatternd)
    - [Description](#description-1)
    - [Parameters](#parameters-1)
    - [Inputs](#inputs-1)
    - [Outputs](#outputs-1)
    - [Type Constraints](#type-constraints-1)
  - [NonMaxSuppression](#nonmaxsuppression)
    - [Description](#description-2)
    - [Parameters](#parameters-2)
    - [Inputs](#inputs-2)
    - [Outputs](#outputs-2)
    - [Type Constraints](#type-constraints-2)
  - [MMCVDeformConv2d](#mmcvdeformconv2d)
    - [Description](#description-3)
    - [Parameters](#parameters-3)
    - [Inputs](#inputs-3)
    - [Outputs](#outputs-3)
    - [Type Constraints](#type-constraints-3)
  - [grid_sampler](#grid_sampler)
    - [Description](#description-4)
    - [Parameters](#parameters-4)
    - [Inputs](#inputs-4)
    - [Outputs](#outputs-4)
    - [Type Constraints](#type-constraints-4)
  - [cummax](#cummax)
    - [Description](#description-5)
    - [Parameters](#parameters-5)
    - [Inputs](#inputs-5)
    - [Outputs](#outputs-5)
    - [Type Constraints](#type-constraints-5)
  - [cummin](#cummin)
    - [Description](#description-6)
    - [Parameters](#parameters-6)
    - [Inputs](#inputs-6)
    - [Outputs](#outputs-6)
    - [Type Constraints](#type-constraints-6)
  - [MMCVInstanceNormalization](#mmcvinstancenormalization)
    - [Description](#description-7)
    - [Parameters](#parameters-7)
    - [Inputs](#inputs-7)
    - [Outputs](#outputs-7)
    - [Type Constraints](#type-constraints-7)
  - [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)
    - [Description](#description-8)
    - [Parameters](#parameters-8)
    - [Inputs](#inputs-8)
    - [Outputs](#outputs-8)
    - [Type Constraints](#type-constraints-8)

<!-- TOC -->

## MMCVRoIAlign

### Description

Perform RoIAlign on output feature, used in bbox_head of most two stage
detectors.

### Parameters

| Type    | Parameter        | Description                                                                                                   |
| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
| `int`   | `output_height`  | height of output roi                                                                                          |
| `int`   | `output_width`   | width of output roi                                                                                           |
| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)

## ScatterND

### Description

ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.

The `output` is calculated via the following equation:

```python
  output = np.copy(data)
  update_indices = indices.shape[:-1]
  for idx in np.ndindex(update_indices):
      output[indices[idx]] = updates[idx]
```

### Parameters

None

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Tensor of rank r>=1.</dd>

<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
<dd>Tensor of rank q>=1.</dd>

<dt><tt>inputs[2]</tt>: T</dt>
<dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Tensor of rank r >= 1.</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear), tensor(int32, Linear)

## NonMaxSuppression

### Description

Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.

### Parameters

| Type    | Parameter                    | Description                                                                                                                          |
| ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `int`   | `center_point_box`           | 0 - the box data is supplied as [y1, x1, y2, x2], 1-the box data is supplied as [x_center, y_center, width, height].                 |
| `int`   | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
| `float` | `iou_threshold`              | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range [0, 1]. Default to 0.                     |
| `float` | `score_threshold`            | The threshold for deciding when to remove boxes based on score.                                                                      |
| `int`   | `offset`                     | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                                                |

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input boxes. 3-D tensor of shape (num_batches, spatial_dimension, 4).</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
<dd>Selected indices. 2-D tensor of shape (num_selected_indices, 3) as [[batch_index, class_index, box_index], ...].</dd>
<dd>num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension).</dd>
<dd>All invalid indices will be filled with -1.</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)

## MMCVDeformConv2d

### Description

Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.

### Parameters

| Type           | Parameter          | Description                                                                                                                       |
| -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
| `list of ints` | `stride`           | The stride of the convolving kernel. (sH, sW)                                                                                     |
| `list of ints` | `padding`          | Paddings on both sides of the input. (padH, padW)                                                                                 |
| `list of ints` | `dilation`         | The spacing between kernel elements. (dH, dW)                                                                                     |
| `int`          | `deformable_group` | Groups of deformable offset.                                                                                                      |
| `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups.                                             |
| `int`          | `im2col_step`      | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)

## grid_sampler

### Description

Perform sample from `input` with pixel locations from `grid`.

### Parameters

| Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`)                                                                                                                                                                                                                   |
| `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
| `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)

## cummax

### Description

Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`.

### Parameters

| Type  | Parameter | Description                             |
| ----- | --------- | --------------------------------------- |
| `int` | `dim`     | The dimension to do the operation over. |

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>The input tensor.</dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output values.</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>Output indices.</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)

## cummin

### Description

Returns a namedtuple (`values`, `indices`) where `values` is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`.

### Parameters

| Type  | Parameter | Description                             |
| ----- | --------- | --------------------------------------- |
| `int` | `dim`     | The dimension to do the operation over. |

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>The input tensor.</dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output values.</dd>
<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>
<dd>Output indices.</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)

## MMCVInstanceNormalization

### Description

Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.

y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.

### Parameters

| Type    | Parameter | Description                                                          |
| ------- | --------- | -------------------------------------------------------------------- |
| `float` | `epsilon` | The epsilon value to use to avoid division by zero. Default is 1e-05 |

### Inputs

<dl>
<dt><tt>input</tt>: T</dt>
<dd>Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.</dd>
<dt><tt>scale</tt>: T</dt>
<dd>The input 1-dimensional scale tensor of size C.</dd>
<dt><tt>B</tt>: T</dt>
<dd>The input 1-dimensional bias tensor of size C.</dd>
</dl>

### Outputs

<dl>
<dt><tt>output</tt>: T</dt>
<dd>The output tensor of the same shape as input.</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)

## MMCVModulatedDeformConv2d

### Description

Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.

### Parameters

| Type           | Parameter          | Description                                                                           |
| -------------- | ------------------ | ------------------------------------------------------------------------------------- |
| `list of ints` | `stride`           | The stride of the convolving kernel. (sH, sW)                                         |
| `list of ints` | `padding`          | Paddings on both sides of the input. (padH, padW)                                     |
| `list of ints` | `dilation`         | The spacing between kernel elements. (dH, dW)                                         |
| `int`          | `deformable_group` | Groups of deformable offset.                                                          |
| `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups. |

### Inputs

<dl>
<dt><tt>inputs[0]</tt>: T</dt>
<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>
<dt><tt>inputs[1]</tt>: T</dt>
<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[2]</tt>: T</dt>
<dd>Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
<dt><tt>inputs[3]</tt>: T</dt>
<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
<dt><tt>inputs[4]</tt>: T, optional</dt>
<dd>Input weight; 1-D tensor of shape (output_channel).</dd>
</dl>

### Outputs

<dl>
<dt><tt>outputs[0]</tt>: T</dt>
<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
</dl>

### Type Constraints

- T:tensor(float32, Linear)
[Doc]: Add document of TensorRT and onnxruntime custom ops (#920) * update tensorrt plugin document * add onnxruntime custom ops document * format document * add release note to onnxruntime_op and tensorrt_plugin * update document * add deployment.rst * add grid_sampler onnxruntime document * fix lint * add allow_different_nesting tag * add allow_different_nesting tag Co-authored-by: maningsheng <maningsheng@sensetime.com> 2021-04-12 14:06:28 +08:00			`# TensorRT Custom Ops`

			`<!-- TOC -->`

			`- [TensorRT Custom Ops](#tensorrt-custom-ops)`
			`- [MMCVRoIAlign](#mmcvroialign)`
			`- [Description](#description)`
			`- [Parameters](#parameters)`
			`- [Inputs](#inputs)`
			`- [Outputs](#outputs)`
			`- [Type Constraints](#type-constraints)`
			`- [ScatterND](#scatternd)`
			`- [Description](#description-1)`
			`- [Parameters](#parameters-1)`
			`- [Inputs](#inputs-1)`
			`- [Outputs](#outputs-1)`
			`- [Type Constraints](#type-constraints-1)`
			`- [NonMaxSuppression](#nonmaxsuppression)`
			`- [Description](#description-2)`
			`- [Parameters](#parameters-2)`
			`- [Inputs](#inputs-2)`
			`- [Outputs](#outputs-2)`
			`- [Type Constraints](#type-constraints-2)`
			`- [MMCVDeformConv2d](#mmcvdeformconv2d)`
			`- [Description](#description-3)`
			`- [Parameters](#parameters-3)`
			`- [Inputs](#inputs-3)`
			`- [Outputs](#outputs-3)`
			`- [Type Constraints](#type-constraints-3)`
			`- [grid_sampler](#grid_sampler)`
			`- [Description](#description-4)`
			`- [Parameters](#parameters-4)`
			`- [Inputs](#inputs-4)`
			`- [Outputs](#outputs-4)`
			`- [Type Constraints](#type-constraints-4)`
[Feature] add cummax/cummin tensorrt plugin (#1031) * add cummax/cummin tensorrt plugin * fix isort * fix with clang-format * fix with clang-format again * add document 2021-05-24 13:55:21 +08:00			`- [cummax](#cummax)`
			`- [Description](#description-5)`
			`- [Parameters](#parameters-5)`
			`- [Inputs](#inputs-5)`
			`- [Outputs](#outputs-5)`
			`- [Type Constraints](#type-constraints-5)`
			`- [cummin](#cummin)`
			`- [Description](#description-6)`
			`- [Parameters](#parameters-6)`
			`- [Inputs](#inputs-6)`
			`- [Outputs](#outputs-6)`
			`- [Type Constraints](#type-constraints-6)`
[Feature]: add TensorRT InstanceNormalization plugin (#1034) * add instancenorm plugin * resolve comments * fix lint * fix typo 2021-05-25 20:03:57 +08:00			`- [MMCVInstanceNormalization](#mmcvinstancenormalization)`
			`- [Description](#description-7)`
			`- [Parameters](#parameters-7)`
			`- [Inputs](#inputs-7)`
			`- [Outputs](#outputs-7)`
			`- [Type Constraints](#type-constraints-7)`
[Feature]: add modulated deformable conv TensorRT support (#1078) * add modulated dcn, better dcn plugin * clangformat * update documentation 2021-06-16 21:38:07 +08:00			`- [MMCVModulatedDeformConv2d](#mmcvmodulateddeformconv2d)`
			`- [Description](#description-8)`
			`- [Parameters](#parameters-8)`
			`- [Inputs](#inputs-8)`
			`- [Outputs](#outputs-8)`
			`- [Type Constraints](#type-constraints-8)`
[Doc]: Add document of TensorRT and onnxruntime custom ops (#920) * update tensorrt plugin document * add onnxruntime custom ops document * format document * add release note to onnxruntime_op and tensorrt_plugin * update document * add deployment.rst * add grid_sampler onnxruntime document * fix lint * add allow_different_nesting tag * add allow_different_nesting tag Co-authored-by: maningsheng <maningsheng@sensetime.com> 2021-04-12 14:06:28 +08:00
			`<!-- TOC -->`

			`## MMCVRoIAlign`

			`### Description`

			`Perform RoIAlign on output feature, used in bbox_head of most two stage`
			`detectors.`

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| ------- \| ---------------- \| ------------------------------------------------------------------------------------------------------------- \|`
			\| `int` \| `output_height` \| height of output roi \|
			\| `int` \| `output_width` \| width of output roi \|
			\| `float` \| `spatial_scale` \| used to scale the input boxes \|
			\| `int` \| `sampling_ratio` \| number of input samples to take for each output sample. `0` means to take samples densely for current models. \|
			\| `str` \| `mode` \| pooling mode in each bin. `avg` or `max` \|
			\| `int` \| `aligned` \| If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. \|

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>`
			`<dt><tt>inputs[1]</tt>: T</dt>`
			`<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: T</dt>`
			`<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`

			`## ScatterND`

			`### Description`

			ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.

			The `output` is calculated via the following equation:

			```python
			`output = np.copy(data)`
			`update_indices = indices.shape[:-1]`
			`for idx in np.ndindex(update_indices):`
			`output[indices[idx]] = updates[idx]`
			```

			`### Parameters`

			`None`

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>Tensor of rank r>=1.</dd>`

			`<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>`
			`<dd>Tensor of rank q>=1.</dd>`

			`<dt><tt>inputs[2]</tt>: T</dt>`
			`<dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: T</dt>`
			`<dd>Tensor of rank r >= 1.</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear), tensor(int32, Linear)`

			`## NonMaxSuppression`

			`### Description`

			`Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.`

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| ------- \| ---------------------------- \| ------------------------------------------------------------------------------------------------------------------------------------ \|`
			\| `int` \| `center_point_box` \| 0 - the box data is supplied as [y1, x1, y2, x2], 1-the box data is supplied as [x_center, y_center, width, height]. \|
			\| `int` \| `max_output_boxes_per_class` \| The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. \|
			\| `float` \| `iou_threshold` \| The threshold for deciding whether boxes overlap too much with respect to IoU. Value range [0, 1]. Default to 0. \|
			\| `float` \| `score_threshold` \| The threshold for deciding when to remove boxes based on score. \|
			\| `int` \| `offset` \| 0 or 1, boxes' width or height is (x2 - x1 + offset). \|

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>Input boxes. 3-D tensor of shape (num_batches, spatial_dimension, 4).</dd>`
			`<dt><tt>inputs[1]</tt>: T</dt>`
			`<dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>`
			`<dd>Selected indices. 2-D tensor of shape (num_selected_indices, 3) as [[batch_index, class_index, box_index], ...].</dd>`
			`<dd>num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension).</dd>`
			`<dd>All invalid indices will be filled with -1.</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`

			`## MMCVDeformConv2d`

			`### Description`

			`Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.`

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| -------------- \| ------------------ \| --------------------------------------------------------------------------------------------------------------------------------- \|`
			\| `list of ints` \| `stride` \| The stride of the convolving kernel. (sH, sW) \|
			\| `list of ints` \| `padding` \| Paddings on both sides of the input. (padH, padW) \|
			\| `list of ints` \| `dilation` \| The spacing between kernel elements. (dH, dW) \|
			\| `int` \| `deformable_group` \| Groups of deformable offset. \|
			\| `int` \| `group` \| Split input into groups. `input_channel` should be divisible by the number of groups. \|
			\| `int` \| `im2col_step` \| DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. \|

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>`
			`<dt><tt>inputs[1]</tt>: T</dt>`
			`<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>`
			`<dt><tt>inputs[2]</tt>: T</dt>`
			`<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: T</dt>`
			`<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`

			`## grid_sampler`

			`### Description`

			Perform sample from `input` with pixel locations from `grid`.

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| ----- \| -------------------- \| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \|`
			\| `int` \| `interpolation_mode` \| Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`) \|
			\| `int` \| `padding_mode` \| Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`) \|
			\| `int` \| `align_corners` \| If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. \|

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>`
			`<dt><tt>inputs[1]</tt>: T</dt>`
			`<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: T</dt>`
			`<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`
[Feature] add cummax/cummin tensorrt plugin (#1031) * add cummax/cummin tensorrt plugin * fix isort * fix with clang-format * fix with clang-format again * add document 2021-05-24 13:55:21 +08:00
			`## cummax`

			`### Description`

			Returns a namedtuple (`values`, `indices`) where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`.

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| ----- \| --------- \| --------------------------------------- \|`
			\| `int` \| `dim` \| The dimension to do the operation over. \|

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>The input tensor.</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: T</dt>`
			`<dd>Output values.</dd>`
			`<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>`
			`<dd>Output indices.</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`

			`## cummin`

			`### Description`

			Returns a namedtuple (`values`, `indices`) where `values` is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`.

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| ----- \| --------- \| --------------------------------------- \|`
			\| `int` \| `dim` \| The dimension to do the operation over. \|

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>The input tensor.</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: T</dt>`
			`<dd>Output values.</dd>`
			`<dt><tt>outputs[1]</tt>: (int32, Linear)</dt>`
			`<dd>Output indices.</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`
[Feature]: add TensorRT InstanceNormalization plugin (#1034) * add instancenorm plugin * resolve comments * fix lint * fix typo 2021-05-25 20:03:57 +08:00
			`## MMCVInstanceNormalization`

			`### Description`

			`Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022.`

			`y = scale * (x - mean) / sqrt(variance + epsilon) + B, where mean and variance are computed per instance per channel.`

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| ------- \| --------- \| -------------------------------------------------------------------- \|`
			\| `float` \| `epsilon` \| The epsilon value to use to avoid division by zero. Default is 1e-05 \|

			`### Inputs`

			`<dl>`
			`<dt><tt>input</tt>: T</dt>`
			`<dd>Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 ... Dn), where N is the batch size.</dd>`
			`<dt><tt>scale</tt>: T</dt>`
			`<dd>The input 1-dimensional scale tensor of size C.</dd>`
			`<dt><tt>B</tt>: T</dt>`
			`<dd>The input 1-dimensional bias tensor of size C.</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>output</tt>: T</dt>`
			`<dd>The output tensor of the same shape as input.</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`
[Feature]: add modulated deformable conv TensorRT support (#1078) * add modulated dcn, better dcn plugin * clangformat * update documentation 2021-06-16 21:38:07 +08:00
			`## MMCVModulatedDeformConv2d`

			`### Description`

			`Perform Modulated Deformable Convolution on input feature, read [Deformable ConvNets v2: More Deformable, Better Results](https://arxiv.org/abs/1811.11168?from=timeline) for detail.`

			`### Parameters`

			`\| Type \| Parameter \| Description \|`
			`\| -------------- \| ------------------ \| ------------------------------------------------------------------------------------- \|`
			\| `list of ints` \| `stride` \| The stride of the convolving kernel. (sH, sW) \|
			\| `list of ints` \| `padding` \| Paddings on both sides of the input. (padH, padW) \|
			\| `list of ints` \| `dilation` \| The spacing between kernel elements. (dH, dW) \|
			\| `int` \| `deformable_group` \| Groups of deformable offset. \|
			\| `int` \| `group` \| Split input into groups. `input_channel` should be divisible by the number of groups. \|

			`### Inputs`

			`<dl>`
			`<dt><tt>inputs[0]</tt>: T</dt>`
			`<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the number of channels, inH and inW are the height and width of the data.</dd>`
			`<dt><tt>inputs[1]</tt>: T</dt>`
			`<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>`
			`<dt><tt>inputs[2]</tt>: T</dt>`
			`<dd>Input mask; 4-D tensor of shape (N, deformable_group* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>`
			`<dt><tt>inputs[3]</tt>: T</dt>`
			`<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>`
			`<dt><tt>inputs[4]</tt>: T, optional</dt>`
			`<dd>Input weight; 1-D tensor of shape (output_channel).</dd>`
			`</dl>`

			`### Outputs`

			`<dl>`
			`<dt><tt>outputs[0]</tt>: T</dt>`
			`<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>`
			`</dl>`

			`### Type Constraints`

			`- T:tensor(float32, Linear)`