mmsegmentation/docs/en/advanced_guides/transforms.md

# Data Transforms

In this tutorial, we introduce the design of transforms pipeline in MMSegmentation.

The structure of this guide is as follows:

- [Data Transforms](#data-transforms)
  - [Design of Data pipelines](#design-of-data-pipelines)
  - [Customization data transformation](#customization-data-transformation)

## Design of Data pipelines

Following typical conventions, we use `Dataset` and `DataLoader` for data loading
with multiple workers. `Dataset` returns a dict of data items corresponding
the arguments of models' forward method.
Since the data in semantic segmentation may not be the same size,
we introduce a new `DataContainer` type in MMCV to help collect and distribute
data of different size.
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.

In 1.x version of MMSegmentation, all data transformations are inherited from `BaseTransform`.
The input and output types of transformations are both dict. A simple example is as follow:

```python
>>> from mmseg.datasets.transforms import LoadAnnotations
>>> transforms = LoadAnnotations()
>>> img_path = './data/cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png.png'
>>> gt_path = './data/cityscapes/gtFine/train/aachen/aachen_000015_000019_gtFine_instanceTrainIds.png'
>>> results = dict(
>>>     img_path=img_path,
>>>     seg_map_path=gt_path,
>>>     reduce_zero_label=False,
>>>     seg_fields=[])
>>> data_dict = transforms(results)
>>> print(data_dict.keys())
dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map'])
```

The data preparation pipeline and the dataset is decomposed. Usually a dataset
defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.

The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.

Here is an pipeline example for PSPNet.

```python
crop_size = (512, 1024)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations'),
    dict(
        type='RandomResize',
        scale=(2048, 1024),
        ratio_range=(0.5, 2.0),
        keep_ratio=True),
    dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(type='PackSegInputs')
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', scale=(2048, 1024), keep_ratio=True),
    # add loading annotation after ``Resize`` because ground truth
    # does not need to do resize data transform
    dict(type='LoadAnnotations'),
    dict(type='PackSegInputs')
]
```

For each operation, we list the related dict fields that are `added`/`updated`/`removed`.
Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`.

### Data loading

`LoadImageFromFile`: Load an image from file.

- add: `img`, `img_shape`, `ori_shape`

`LoadAnnotations`: Load semantic segmentation maps provided by dataset.

- add: `seg_fields`, `gt_seg_map`

### Pre-processing

`RandomResize`: Random resize image & segmentation map.

- add: `scale`, `scale_factor`, `keep_ratio`
- update: `img`, `img_shape`, `gt_seg_map`

`Resize`: Resize image & segmentation map.

- add: `scale`, `scale_factor`, `keep_ratio`
- update: `img`, `gt_seg_map`, `img_shape`

`RandomCrop`: Random crop image & segmentation map.

- update: `img`, `gt_seg_map`, `img_shape`.

`RandomFlip`: Flip the image & segmentation map.

- add: `flip`, `flip_direction`
- update: `img`, `gt_seg_map`

`PhotoMetricDistortion`: Apply photometric distortion to image sequentially,
every transformation is applied with a probability of 0.5.
The position of random contrast is in second or second to last(mode 0 or 1 below, respectively).

```
1. random brightness
2. random contrast (mode 0)
3. convert color from BGR to HSV
4. random saturation
5. random hue
6. convert color from HSV to BGR
7. random contrast (mode 1)
```

- update: `img`

### Formatting

`PackSegInputs`: Pack the inputs data for the semantic segmentation.

- add: `inputs`, `data_sample`
- remove: keys specified by `meta_keys` (merged into the metainfo of data_sample), all other keys

## Customization data transformation

The customized data transformation must inherinted from `BaseTransform` and implement `transform` function.
Here we use a simple flipping transformation as example:

```python
import random
import mmcv
from mmcv.transforms import BaseTransform, TRANSFORMS

@TRANSFORMS.register_module()
class MyFlip(BaseTransform):
    def __init__(self, direction: str):
        super().__init__()
        self.direction = direction

    def transform(self, results: dict) -> dict:
        img = results['img']
        results['img'] = mmcv.imflip(img, direction=self.direction)
        return results
```

Thus, we can instantiate a `MyFlip` object and use it to process the data dict.

```python
import numpy as np

transform = MyFlip(direction='horizontal')
data_dict = {'img': np.random.rand(224, 224, 3)}
data_dict = transform(data_dict)
processed_img = data_dict['img']
```

Or, we can use `MyFlip` transformation in data pipeline in our config file.

```python
pipeline = [
    ...
    dict(type='MyFlip', direction='horizontal'),
    ...
]
```

Note that if you want to use `MyFlip` in config, you must ensure the file containing `MyFlip` is imported during the program run.
transform tutorial (#1953) 2022-08-23 14:19:11 +08:00			`# Data Transforms`
init commit 2020-07-07 20:52:19 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`In this tutorial, we introduce the design of transforms pipeline in MMSegmentation.`

			`The structure of this guide is as follows:`

			`- [Data Transforms](#data-transforms)`
			`- [Design of Data pipelines](#design-of-data-pipelines)`
			`- [Customization data transformation](#customization-data-transformation)`

init commit 2020-07-07 20:52:19 +08:00			`## Design of Data pipelines`

			Following typical conventions, we use `Dataset` and `DataLoader` for data loading
			with multiple workers. `Dataset` returns a dict of data items corresponding
			`the arguments of models' forward method.`
			`Since the data in semantic segmentation may not be the same size,`
			we introduce a new `DataContainer` type in MMCV to help collect and distribute
			`data of different size.`
			`See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.`

[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			In 1.x version of MMSegmentation, all data transformations are inherited from `BaseTransform`.
			`The input and output types of transformations are both dict. A simple example is as follow:`

			```python
			`>>> from mmseg.datasets.transforms import LoadAnnotations`
			`>>> transforms = LoadAnnotations()`
			`>>> img_path = './data/cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png.png'`
			`>>> gt_path = './data/cityscapes/gtFine/train/aachen/aachen_000015_000019_gtFine_instanceTrainIds.png'`
			`>>> results = dict(`
			`>>> img_path=img_path,`
			`>>> seg_map_path=gt_path,`
			`>>> reduce_zero_label=False,`
			`>>> seg_fields=[])`
			`>>> data_dict = transforms(results)`
			`>>> print(data_dict.keys())`
			`dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map'])`
			```

init commit 2020-07-07 20:52:19 +08:00			`The data preparation pipeline and the dataset is decomposed. Usually a dataset`
			`defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.`
			`A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.`

			`The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.`

			`Here is an pipeline example for PSPNet.`

			```python
			`crop_size = (512, 1024)`
			`train_pipeline = [`
			`dict(type='LoadImageFromFile'),`
			`dict(type='LoadAnnotations'),`
transform tutorial (#1953) 2022-08-23 14:19:11 +08:00			`dict(`
			`type='RandomResize',`
			`scale=(2048, 1024),`
			`ratio_range=(0.5, 2.0),`
			`keep_ratio=True),`
init commit 2020-07-07 20:52:19 +08:00			`dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),`
transform tutorial (#1953) 2022-08-23 14:19:11 +08:00			`dict(type='RandomFlip', prob=0.5),`
init commit 2020-07-07 20:52:19 +08:00			`dict(type='PhotoMetricDistortion'),`
transform tutorial (#1953) 2022-08-23 14:19:11 +08:00			`dict(type='PackSegInputs')`
init commit 2020-07-07 20:52:19 +08:00			`]`
			`test_pipeline = [`
			`dict(type='LoadImageFromFile'),`
transform tutorial (#1953) 2022-08-23 14:19:11 +08:00			`dict(type='Resize', scale=(2048, 1024), keep_ratio=True),`
			# add loading annotation after ``Resize`` because ground truth
			`# does not need to do resize data transform`
			`dict(type='LoadAnnotations'),`
			`dict(type='PackSegInputs')`
init commit 2020-07-07 20:52:19 +08:00			`]`
			```

[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			For each operation, we list the related dict fields that are `added`/`updated`/`removed`.
			Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`.
init commit 2020-07-07 20:52:19 +08:00
			`### Data loading`

[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`LoadImageFromFile`: Load an image from file.
[Improvement] Add markdown linter and fix linting errors (#171) * [Improvement] Add markdown linter and fix linting errors * fixed pip 2020-10-07 19:50:16 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			- add: `img`, `img_shape`, `ori_shape`
init commit 2020-07-07 20:52:19 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`LoadAnnotations`: Load semantic segmentation maps provided by dataset.
[Improvement] Add markdown linter and fix linting errors (#171) * [Improvement] Add markdown linter and fix linting errors * fixed pip 2020-10-07 19:50:16 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			- add: `seg_fields`, `gt_seg_map`
init commit 2020-07-07 20:52:19 +08:00
			`### Pre-processing`

[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`RandomResize`: Random resize image & segmentation map.
[Improvement] Add markdown linter and fix linting errors (#171) * [Improvement] Add markdown linter and fix linting errors * fixed pip 2020-10-07 19:50:16 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			- add: `scale`, `scale_factor`, `keep_ratio`
			- update: `img`, `img_shape`, `gt_seg_map`
init commit 2020-07-07 20:52:19 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`Resize`: Resize image & segmentation map.
[Improvement] Add markdown linter and fix linting errors (#171) * [Improvement] Add markdown linter and fix linting errors * fixed pip 2020-10-07 19:50:16 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			- add: `scale`, `scale_factor`, `keep_ratio`
			- update: `img`, `gt_seg_map`, `img_shape`
init commit 2020-07-07 20:52:19 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`RandomCrop`: Random crop image & segmentation map.
[Improvement] Add markdown linter and fix linting errors (#171) * [Improvement] Add markdown linter and fix linting errors * fixed pip 2020-10-07 19:50:16 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			- update: `img`, `gt_seg_map`, `img_shape`.
init commit 2020-07-07 20:52:19 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`RandomFlip`: Flip the image & segmentation map.
[Improvement] Add markdown linter and fix linting errors (#171) * [Improvement] Add markdown linter and fix linting errors * fixed pip 2020-10-07 19:50:16 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			- add: `flip`, `flip_direction`
			- update: `img`, `gt_seg_map`
init commit 2020-07-07 20:52:19 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`PhotoMetricDistortion`: Apply photometric distortion to image sequentially,
			`every transformation is applied with a probability of 0.5.`
			`The position of random contrast is in second or second to last(mode 0 or 1 below, respectively).`
[Improvement] Add markdown linter and fix linting errors (#171) * [Improvement] Add markdown linter and fix linting errors * fixed pip 2020-10-07 19:50:16 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			```
			`1. random brightness`
			`2. random contrast (mode 0)`
			`3. convert color from BGR to HSV`
			`4. random saturation`
			`5. random hue`
			`6. convert color from HSV to BGR`
			`7. random contrast (mode 1)`
			```

			- update: `img`
init commit 2020-07-07 20:52:19 +08:00
			`### Formatting`

[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			`PackSegInputs`: Pack the inputs data for the semantic segmentation.
init commit 2020-07-07 20:52:19 +08:00
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00			- add: `inputs`, `data_sample`
transform tutorial (#1953) 2022-08-23 14:19:11 +08:00			- remove: keys specified by `meta_keys` (merged into the metainfo of data_sample), all other keys
[Doc] Updata transforms Doc 2022-09-19 17:57:41 +08:00
			`## Customization data transformation`

			The customized data transformation must inherinted from `BaseTransform` and implement `transform` function.
			`Here we use a simple flipping transformation as example:`

			```python
			`import random`
			`import mmcv`
			`from mmcv.transforms import BaseTransform, TRANSFORMS`

			`@TRANSFORMS.register_module()`
			`class MyFlip(BaseTransform):`
			`def __init__(self, direction: str):`
			`super().__init__()`
			`self.direction = direction`

			`def transform(self, results: dict) -> dict:`
			`img = results['img']`
			`results['img'] = mmcv.imflip(img, direction=self.direction)`
			`return results`
			```

			Thus, we can instantiate a `MyFlip` object and use it to process the data dict.

			```python
			`import numpy as np`

			`transform = MyFlip(direction='horizontal')`
			`data_dict = {'img': np.random.rand(224, 224, 3)}`
			`data_dict = transform(data_dict)`
			`processed_img = data_dict['img']`
			```

			Or, we can use `MyFlip` transformation in data pipeline in our config file.

			```python
			`pipeline = [`
			`...`
			`dict(type='MyFlip', direction='horizontal'),`
			`...`
			`]`
			```

			Note that if you want to use `MyFlip` in config, you must ensure the file containing `MyFlip` is imported during the program run.