[Doc] Updata transforms Doc

2025-06-03 22:03:48 +08:00 · 2022-09-19 17:57:41 +08:00 · 2022-09-19 17:57:41 +08:00 · 89f6647d1f
commit 89f6647d1f
parent eef38883c8
1 changed files with 104 additions and 21 deletions
--- a/docs/en/advanced_guides/transforms.md
+++ b/docs/en/advanced_guides/transforms.md
@ -1,5 +1,13 @@
 # Data Transforms
 In this tutorial, we introduce the design of transforms pipeline in MMSegmentation.
 The structure of this guide is as follows:
 - [Data Transforms](#data-transforms)
  - [Design of Data pipelines](#design-of-data-pipelines)
  - [Customization data transformation](#customization-data-transformation)
 ## Design of Data pipelines
 Following typical conventions, we use `Dataset` and `DataLoader` for data loading
@ -10,6 +18,24 @@ we introduce a new `DataContainer` type in MMCV to help collect and distribute
 data of different size.
 See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details.
 In 1.x version of MMSegmentation, all data transformations are inherited from `BaseTransform`.
 The input and output types of transformations are both dict. A simple example is as follow:
 ```python
 >>> from mmseg.datasets.transforms import LoadAnnotations
 >>> transforms = LoadAnnotations()
 >>> img_path = './data/cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png.png'
 >>> gt_path = './data/cityscapes/gtFine/train/aachen/aachen_000015_000019_gtFine_instanceTrainIds.png'
 >>> results = dict(
 >>>     img_path=img_path,
 >>>     seg_map_path=gt_path,
 >>>     reduce_zero_label=False,
 >>>     seg_fields=[])
 >>> data_dict = transforms(results)
 >>> print(data_dict.keys())
 dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map'])
 ```
 The data preparation pipeline and the dataset is decomposed. Usually a dataset
 defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
 A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
@ -43,47 +69,104 @@ test_pipeline = [
 ]
 ```
-For each operation, we list the related dict fields that are added/updated/removed.
+For each operation, we list the related dict fields that are `added`/`updated`/`removed`.
-Before pipelines, the information we can directly obtain from the datasets are img_path, seg_map_path.
+Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`.
 ### Data loading
-`LoadImageFromFile`
+`LoadImageFromFile`: Load an image from file.
- add: img, img_shape, ori_shape
+- add: `img`, `img_shape`, `ori_shape`
-`LoadAnnotations`
+`LoadAnnotations`: Load semantic segmentation maps provided by dataset.
- add: seg_fields, gt_seg_map
+- add: `seg_fields`, `gt_seg_map`
 ### Pre-processing
-`RandomResize`
+`RandomResize`: Random resize image & segmentation map.
- add: scale, scale_factor, keep_ratio
+- add: `scale`, `scale_factor`, `keep_ratio`
- update: img, img_shape, gt_seg_map
+- update: `img`, `img_shape`, `gt_seg_map`
-`Resize`
+`Resize`: Resize image & segmentation map.
- add: scale, scale_factor, keep_ratio
+- add: `scale`, `scale_factor`, `keep_ratio`
- update: img, gt_seg_map, img_shape
+- update: `img`, `gt_seg_map`, `img_shape`
-`RandomCrop`
+`RandomCrop`: Random crop image & segmentation map.
- update: img, pad_shape, gt_seg_map
+- update: `img`, `gt_seg_map`, `img_shape`.
-`RandomFlip`
+`RandomFlip`: Flip the image & segmentation map.
- add: flip, flip_direction
+- add: `flip`, `flip_direction`
- update: img, gt_seg_map
+- update: `img`, `gt_seg_map`
-`PhotoMetricDistortion`
+`PhotoMetricDistortion`: Apply photometric distortion to image sequentially,
 every transformation is applied with a probability of 0.5.
 The position of random contrast is in second or second to last(mode 0 or 1 below, respectively).
- update: img
+```
 1. random brightness
 2. random contrast (mode 0)
 3. convert color from BGR to HSV
 4. random saturation
 5. random hue
 6. convert color from HSV to BGR
 7. random contrast (mode 1)
 ```
 - update: `img`
 ### Formatting
-`PackSegInputs`
+`PackSegInputs`: Pack the inputs data for the semantic segmentation.
- add: inputs, data_sample
+- add: `inputs`, `data_sample`
 - remove: keys specified by `meta_keys` (merged into the metainfo of data_sample), all other keys
 ## Customization data transformation
 The customized data transformation must inherinted from `BaseTransform` and implement `transform` function.
 Here we use a simple flipping transformation as example:
 ```python
 import random
 import mmcv
 from mmcv.transforms import BaseTransform, TRANSFORMS
@TRANSFORMS.register_module()
 class MyFlip(BaseTransform):
    def __init__(self, direction: str):
        super().__init__()
        self.direction = direction
    def transform(self, results: dict) -> dict:
        img = results['img']
        results['img'] = mmcv.imflip(img, direction=self.direction)
        return results
 ```
 Thus, we can instantiate a `MyFlip` object and use it to process the data dict.
 ```python
 import numpy as np
 transform = MyFlip(direction='horizontal')
 data_dict = {'img': np.random.rand(224, 224, 3)}
 data_dict = transform(data_dict)
 processed_img = data_dict['img']
 ```
 Or, we can use `MyFlip` transformation in data pipeline in our config file.
 ```python
 pipeline = [
    ...
    dict(type='MyFlip', direction='horizontal'),
    ...
 ]
 ```
 Note that if you want to use `MyFlip` in config, you must ensure the file containing `MyFlip` is imported during the program run.