mmengine/docs/en/migration/transform.md

163 lines
7.5 KiB
Markdown

# Migrate Data Transform to OpenMMLab 2.0
## Introduction
According to the data transform interface convention of TorchVision, all data transform classes need to
implement the `__call__` method. And in the convention of OpenMMLab 1.0, we require the input and output of
the `__call__` method should be a dictionary.
In OpenMMLab 2.0, to make the data transform classes more extensible, we use `transform` method instead of
`__call__` method to implement data transformation, and all data transform classes should inherit the
[mmcv.transforms.BaseTransform](mmcv.transforms.BaseTransform) class. And you can still use these data
transform classes by calling.
A tutorial to implement a data transform class can be found in the [Data Transform](../advanced_tutorials/data_element.md).
In addition, we move some common data transform classes from every repositories to MMCV, and in this document,
we will compare the functionalities, usages and implementations between the original data transform classes (in [MMClassification v0.23.2](https://github.com/open-mmlab/mmclassification/tree/v0.23.2), [MMDetection v2.25.1](https://github.com/open-mmlab/mmdetection/tree/v2.25.1)) and the new data transform classes (in [MMCV v2.0.0rc1](https://github.com/open-mmlab/mmcv/tree/2.x))
## Functionality Differences
<table class="colwidths-auto docutils align-default">
<thead>
<tr>
<th></th>
<th>MMClassification (original)</th>
<th>MMDetection (original)</th>
<th>MMCV (new)</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>LoadImageFromFile</code></td>
<td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading.</td>
<td>Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support
specifying the order of channels.</td>
<td>Load images from 'img_path'. Support ignoring failed loading and specifying decode backend.</td>
</tr>
<tr>
<td><code>LoadAnnotations</code></td>
<td>Not available.</td>
<td>Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system.</td>
<td>Load bbox, label, mask (not include polygon masks), semantic segmentation.</td>
</tr>
<tr>
<td><code>Pad</code></td>
<td>Pad all images in the "img_fields" field.</td>
<td>Pad all images in the "img_fields" field. Support padding to integer multiple size.</td>
<td>Pad the image in the "img" field. Support padding to integer multiple size.</td>
</tr>
<tr>
<td><code>CenterCrop</code></td>
<td>Crop all images in the "img_fields" field. Support cropping as EfficientNet style.</td>
<td>Not available.</td>
<td>Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image.</td>
</tr>
<tr>
<td><code>Normalize</code></td>
<td>Normalize the image.</td>
<td>No differences.</td>
<td>No differences, but we recommend to use <a href="../tutorials/model.html#datapreprocessor">data preprocessor</a> to normalize the image.</td>
</tr>
<tr>
<td><code>Resize</code></td>
<td>Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge.</td>
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, the <code>img_scale</code> have a single scale, and <code>multiscale_mode="value"</code>.</td>
<td>Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally.</td>
</tr>
<tr>
<td><code>RandomResize</code></td>
<td>Not available</td>
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have two scales and <code>multiscale_mode="range"</code>, or <code>ratio_range</code> is not None.
<pre>Resize(
img_sacle=[(640, 480), (960, 720)],
mode="range",
)</pre>
</td>
<td>Have the same resize function as <code>Resize</code>. Support sampling the scale from a scale range or scale ratio range.
<pre>RandomResize(scale=[(640, 480), (960, 720)])</pre>
</td>
</tr>
<tr>
<td><code>RandomChoiceResize</code></td>
<td>Not available</td>
<td>Use <code>Resize</code> with <code>ratio_range=None</code>, <code>img_scale</code> have multiple scales, and <code>multiscale_mode="value"</code>.
<pre>Resize(
img_sacle=[(640, 480), (960, 720)],
mode="value",
)</pre>
</td>
<td>Have the same resize function as <code>Resize</code>. Support randomly choosing the scale from multiple scales or multiple scale ratios.
<pre>RandomChoiceResize(scales=[(640, 480), (960, 720)])</pre>
</td>
</tr>
<tr>
<td><code>RandomGrayscale</code></td>
<td>Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale.</td>
<td>Not available</td>
<td>Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale.</td>
</tr>
<tr>
<td><code>RandomFlip</code></td>
<td>Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically.</td>
<td>Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td>
<td>Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping.</td>
</tr>
<tr>
<td><code>MultiScaleFlipAug</code></td>
<td>Not available</td>
<td>Used for test-time-augmentation.</td>
<td>Use <code><a href="https://mmcv.readthedocs.io/en/2.x/api/generated/mmcv.transforms.TestTimeAug.html">TestTimeAug</a></code></td>
</tr>
<tr>
<td><code>ToTensor</code></td>
<td>Convert the values in the specified fields to <code>torch.Tensor</code>.</td>
<td>No differences</td>
<td>No differences</td>
</tr>
<tr>
<td><code>ImageToTensor</code></td>
<td>Convert the values in the specified fields to <code>torch.Tensor</code> and transpose the channels to CHW.</td>
<td>No differences.</td>
<td>No differences.</td>
</tr>
</tbody>
</table>
## Implementation Differences
Take `RandomFlip` as example, the new version [RandomFlip](mmcv.transforms.RandomFlip) in MMCV inherits `BaseTransfrom`, and move the
functionality implementation from `__call__` to `transform` method. In addition, the randomness related code
is placed in some extra methods and these methods need to be wrapped by `cache_randomness` decorator.
- MMDetection (original version)
```python
class RandomFlip:
def __call__(self, results):
"""Randomly flip images."""
...
# Randomly choose the flip direction
cur_dir = np.random.choice(direction_list, p=flip_ratio_list)
...
return results
```
- MMCV (new version)
```python
class RandomFlip(BaseTransfrom):
def transform(self, results):
"""Randomly flip images"""
...
cur_dir = self._random_direction()
...
return results
@cache_randomness
def _random_direction(self):
"""Randomly choose the flip direction"""
...
return np.random.choice(direction_list, p=flip_ratio_list)
```