2022-08-23 14:19:11 +08:00
|
|
|
# Data Transforms
|
2020-07-07 20:52:19 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
In this tutorial, we introduce the design of transforms pipeline in MMSegmentation.
|
|
|
|
|
|
|
|
The structure of this guide is as follows:
|
|
|
|
|
|
|
|
- [Data Transforms](#data-transforms)
|
|
|
|
- [Design of Data pipelines](#design-of-data-pipelines)
|
|
|
|
- [Customization data transformation](#customization-data-transformation)
|
|
|
|
|
2020-07-07 20:52:19 +08:00
|
|
|
## Design of Data pipelines
|
|
|
|
|
|
|
|
Following typical conventions, we use `Dataset` and `DataLoader` for data loading
|
|
|
|
with multiple workers. `Dataset` returns a dict of data items corresponding
|
|
|
|
the arguments of models' forward method.
|
|
|
|
Since the data in semantic segmentation may not be the same size,
|
|
|
|
we introduce a new `DataContainer` type in MMCV to help collect and distribute
|
|
|
|
data of different size.
|
|
|
|
See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
|
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
In 1.x version of MMSegmentation, all data transformations are inherited from `BaseTransform`.
|
|
|
|
The input and output types of transformations are both dict. A simple example is as follow:
|
|
|
|
|
|
|
|
```python
|
|
|
|
>>> from mmseg.datasets.transforms import LoadAnnotations
|
|
|
|
>>> transforms = LoadAnnotations()
|
|
|
|
>>> img_path = './data/cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png.png'
|
|
|
|
>>> gt_path = './data/cityscapes/gtFine/train/aachen/aachen_000015_000019_gtFine_instanceTrainIds.png'
|
|
|
|
>>> results = dict(
|
|
|
|
>>> img_path=img_path,
|
|
|
|
>>> seg_map_path=gt_path,
|
|
|
|
>>> reduce_zero_label=False,
|
|
|
|
>>> seg_fields=[])
|
|
|
|
>>> data_dict = transforms(results)
|
|
|
|
>>> print(data_dict.keys())
|
|
|
|
dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map'])
|
|
|
|
```
|
|
|
|
|
2020-07-07 20:52:19 +08:00
|
|
|
The data preparation pipeline and the dataset is decomposed. Usually a dataset
|
|
|
|
defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
|
|
|
|
A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
|
|
|
|
|
|
|
|
The operations are categorized into data loading, pre-processing, formatting and test-time augmentation.
|
|
|
|
|
|
|
|
Here is an pipeline example for PSPNet.
|
|
|
|
|
|
|
|
```python
|
|
|
|
crop_size = (512, 1024)
|
|
|
|
train_pipeline = [
|
|
|
|
dict(type='LoadImageFromFile'),
|
|
|
|
dict(type='LoadAnnotations'),
|
2022-08-23 14:19:11 +08:00
|
|
|
dict(
|
|
|
|
type='RandomResize',
|
|
|
|
scale=(2048, 1024),
|
|
|
|
ratio_range=(0.5, 2.0),
|
|
|
|
keep_ratio=True),
|
2020-07-07 20:52:19 +08:00
|
|
|
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
|
2022-08-23 14:19:11 +08:00
|
|
|
dict(type='RandomFlip', prob=0.5),
|
2020-07-07 20:52:19 +08:00
|
|
|
dict(type='PhotoMetricDistortion'),
|
2022-08-23 14:19:11 +08:00
|
|
|
dict(type='PackSegInputs')
|
2020-07-07 20:52:19 +08:00
|
|
|
]
|
|
|
|
test_pipeline = [
|
|
|
|
dict(type='LoadImageFromFile'),
|
2022-08-23 14:19:11 +08:00
|
|
|
dict(type='Resize', scale=(2048, 1024), keep_ratio=True),
|
|
|
|
# add loading annotation after ``Resize`` because ground truth
|
|
|
|
# does not need to do resize data transform
|
|
|
|
dict(type='LoadAnnotations'),
|
|
|
|
dict(type='PackSegInputs')
|
2020-07-07 20:52:19 +08:00
|
|
|
]
|
|
|
|
```
|
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
For each operation, we list the related dict fields that are `added`/`updated`/`removed`.
|
|
|
|
Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`.
|
2020-07-07 20:52:19 +08:00
|
|
|
|
|
|
|
### Data loading
|
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`LoadImageFromFile`: Load an image from file.
|
2020-10-07 19:50:16 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
- add: `img`, `img_shape`, `ori_shape`
|
2020-07-07 20:52:19 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`LoadAnnotations`: Load semantic segmentation maps provided by dataset.
|
2020-10-07 19:50:16 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
- add: `seg_fields`, `gt_seg_map`
|
2020-07-07 20:52:19 +08:00
|
|
|
|
|
|
|
### Pre-processing
|
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`RandomResize`: Random resize image & segmentation map.
|
2020-10-07 19:50:16 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
- add: `scale`, `scale_factor`, `keep_ratio`
|
|
|
|
- update: `img`, `img_shape`, `gt_seg_map`
|
2020-07-07 20:52:19 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`Resize`: Resize image & segmentation map.
|
2020-10-07 19:50:16 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
- add: `scale`, `scale_factor`, `keep_ratio`
|
|
|
|
- update: `img`, `gt_seg_map`, `img_shape`
|
2020-07-07 20:52:19 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`RandomCrop`: Random crop image & segmentation map.
|
2020-10-07 19:50:16 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
- update: `img`, `gt_seg_map`, `img_shape`.
|
2020-07-07 20:52:19 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`RandomFlip`: Flip the image & segmentation map.
|
2020-10-07 19:50:16 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
- add: `flip`, `flip_direction`
|
|
|
|
- update: `img`, `gt_seg_map`
|
2020-07-07 20:52:19 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`PhotoMetricDistortion`: Apply photometric distortion to image sequentially,
|
|
|
|
every transformation is applied with a probability of 0.5.
|
|
|
|
The position of random contrast is in second or second to last(mode 0 or 1 below, respectively).
|
2020-10-07 19:50:16 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
```
|
|
|
|
1. random brightness
|
|
|
|
2. random contrast (mode 0)
|
|
|
|
3. convert color from BGR to HSV
|
|
|
|
4. random saturation
|
|
|
|
5. random hue
|
|
|
|
6. convert color from HSV to BGR
|
|
|
|
7. random contrast (mode 1)
|
|
|
|
```
|
|
|
|
|
|
|
|
- update: `img`
|
2020-07-07 20:52:19 +08:00
|
|
|
|
|
|
|
### Formatting
|
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
`PackSegInputs`: Pack the inputs data for the semantic segmentation.
|
2020-07-07 20:52:19 +08:00
|
|
|
|
2022-09-19 17:57:41 +08:00
|
|
|
- add: `inputs`, `data_sample`
|
2022-08-23 14:19:11 +08:00
|
|
|
- remove: keys specified by `meta_keys` (merged into the metainfo of data_sample), all other keys
|
2022-09-19 17:57:41 +08:00
|
|
|
|
|
|
|
## Customization data transformation
|
|
|
|
|
|
|
|
The customized data transformation must inherinted from `BaseTransform` and implement `transform` function.
|
|
|
|
Here we use a simple flipping transformation as example:
|
|
|
|
|
|
|
|
```python
|
|
|
|
import random
|
|
|
|
import mmcv
|
|
|
|
from mmcv.transforms import BaseTransform, TRANSFORMS
|
|
|
|
|
|
|
|
@TRANSFORMS.register_module()
|
|
|
|
class MyFlip(BaseTransform):
|
|
|
|
def __init__(self, direction: str):
|
|
|
|
super().__init__()
|
|
|
|
self.direction = direction
|
|
|
|
|
|
|
|
def transform(self, results: dict) -> dict:
|
|
|
|
img = results['img']
|
|
|
|
results['img'] = mmcv.imflip(img, direction=self.direction)
|
|
|
|
return results
|
|
|
|
```
|
|
|
|
|
|
|
|
Thus, we can instantiate a `MyFlip` object and use it to process the data dict.
|
|
|
|
|
|
|
|
```python
|
|
|
|
import numpy as np
|
|
|
|
|
|
|
|
transform = MyFlip(direction='horizontal')
|
|
|
|
data_dict = {'img': np.random.rand(224, 224, 3)}
|
|
|
|
data_dict = transform(data_dict)
|
|
|
|
processed_img = data_dict['img']
|
|
|
|
```
|
|
|
|
|
|
|
|
Or, we can use `MyFlip` transformation in data pipeline in our config file.
|
|
|
|
|
|
|
|
```python
|
|
|
|
pipeline = [
|
|
|
|
...
|
|
|
|
dict(type='MyFlip', direction='horizontal'),
|
|
|
|
...
|
|
|
|
]
|
|
|
|
```
|
|
|
|
|
|
|
|
Note that if you want to use `MyFlip` in config, you must ensure the file containing `MyFlip` is imported during the program run.
|