diff --git a/docs/en/advanced_guides/transforms.md b/docs/en/advanced_guides/transforms.md index e0c4155b5..68b1f44bd 100644 --- a/docs/en/advanced_guides/transforms.md +++ b/docs/en/advanced_guides/transforms.md @@ -12,15 +12,10 @@ The structure of this guide is as follows: ## Design of Data pipelines -Following typical conventions, we use `Dataset` and `DataLoader` for data loading -with multiple workers. `Dataset` returns a dict of data items corresponding -the arguments of models' forward method. -Since the data in semantic segmentation may not be the same size, -we introduce a new `DataContainer` type in MMCV to help collect and distribute -data of different size. -See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details. +Following typical conventions, we use `Dataset` and `DataLoader` for data loading with multiple workers. `Dataset` returns a dict of data items corresponding the arguments of models' forward method. Since the data in semantic segmentation may not be the same size, we introduce a new `DataContainer` type in MMCV to help collect and distribute data of different size. See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details. In 1.x version of MMSegmentation, all data transformations are inherited from [`BaseTransform`](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/transforms/base.py#L6). + The input and output types of transformations are both dict. A simple example is as follows: ```python @@ -38,13 +33,11 @@ The input and output types of transformations are both dict. A simple example is dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map']) ``` -The data preparation pipeline and the dataset are decomposed. Usually a dataset -defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict. -A pipeline consists of a sequence of operations. Each operation takes a dict as input and also outputs a dict for the next transform. +The data preparation pipeline and the dataset are decomposed. Usually a dataset defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict. A pipeline consists of a sequence of operations. Each operation takes a dict as input and also outputs a dict for the next transform. The operations are categorized into data loading, pre-processing, formatting and test-time augmentation. -Here is a pipeline example for PSPNet. +Here is a pipeline example for PSPNet: ```python crop_size = (512, 1024) @@ -71,8 +64,7 @@ test_pipeline = [ ] ``` -For each operation, we list the related dict fields that are `added`/`updated`/`removed`. -Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`. +For each operation, we list the related dict fields that are `added`/`updated`/`removed`. Before pipelines, the information we can directly obtain from the datasets are `img_path` and `seg_map_path`. ### Data loading @@ -98,16 +90,14 @@ Before pipelines, the information we can directly obtain from the datasets are ` `RandomCrop`: Random crop image & segmentation map. -- update: `img`, `gt_seg_map`, `img_shape`. +- update: `img`, `gt_seg_map`, `img_shape` `RandomFlip`: Flip the image & segmentation map. - add: `flip`, `flip_direction` - update: `img`, `gt_seg_map` -`PhotoMetricDistortion`: Apply photometric distortion to image sequentially, -every transformation is applied with a probability of 0.5. -The position of random contrast is in second or second to last(mode 0 or 1 below, respectively). +`PhotoMetricDistortion`: Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last(mode 0 or 1 below, respectively). ``` 1. random brightness diff --git a/docs/zh_cn/advanced_guides/transforms.md b/docs/zh_cn/advanced_guides/transforms.md index 1cbe79ba4..e5f3bebf6 100644 --- a/docs/zh_cn/advanced_guides/transforms.md +++ b/docs/zh_cn/advanced_guides/transforms.md @@ -1,3 +1,119 @@ # 数据增强变化 -中文版文档支持中,请先阅读[英文版本](../../en/advanced_guides/transforms.md) +在本教程中,我们将介绍 MMSegmentation 中数据增强变化流程的设计。 + +本指南的结构如下: + +- [数据增强变化](#数据增强变化) + - [数据增强变化流程设计](#数据增强变化流程设计) + - [数据加载](#数据加载) + - [预处理](#预处理) + - [格式修改](#格式修改) + +## 数据增强变化流程设计 + +按照惯例,我们使用 `Dataset` 和 `DataLoader` 多进程地加载数据。`Dataset` 返回与模型 forward 方法的参数相对应的数据项的字典。由于语义分割中的数据可能大小不同,我们在 MMCV 中引入了一种新的 `DataContainer` 类型,以帮助收集和分发不同大小的数据。参见[此处](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py)了解更多详情。 + +在 MMSegmentation 的 1.x 版本中,所有数据转换都继承自 [`BaseTransform`](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/transforms/base.py#L6). + +转换的输入和输出类型都是字典。一个简单的示例如下: + +```python +>>> from mmseg.datasets.transforms import LoadAnnotations +>>> transforms = LoadAnnotations() +>>> img_path = './data/cityscapes/leftImg8bit/train/aachen/aachen_000000_000019_leftImg8bit.png.png' +>>> gt_path = './data/cityscapes/gtFine/train/aachen/aachen_000015_000019_gtFine_instanceTrainIds.png' +>>> results = dict( +>>> img_path=img_path, +>>> seg_map_path=gt_path, +>>> reduce_zero_label=False, +>>> seg_fields=[]) +>>> data_dict = transforms(results) +>>> print(data_dict.keys()) +dict_keys(['img_path', 'seg_map_path', 'reduce_zero_label', 'seg_fields', 'gt_seg_map']) +``` + +数据准备流程和数据集是解耦的。通常,数据集定义如何处理标注,数据流程定义准备数据字典的所有步骤。流程由一系列操作组成。每个操作都将字典作为输入,并为接下来的转换输出字典。 + +操作分为数据加载、预处理、格式修改和测试数据增强。 + +这里是 PSPNet 的流程示例: + +```python +crop_size = (512, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict( + type='RandomResize', + scale=(2048, 1024), + ratio_range=(0.5, 2.0), + keep_ratio=True), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='PackSegInputs') +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='Resize', scale=(2048, 1024), keep_ratio=True), + # add loading annotation after ``Resize`` because ground truth + # does not need to resize data transform + dict(type='LoadAnnotations'), + dict(type='PackSegInputs') +] +``` + +对于每个操作,我们列出了 `添加`/`更新`/`删除` 相关的字典字段。在流程前,我们可以从数据集直接获得的信息是 `img_path` 和 `seg_map_path`。 + +### 数据加载 + +`LoadImageFromFile`:从文件加载图像。 + +- 添加:`img`,`img_shape`,`ori_shape` + +`LoadAnnotations`:加载数据集提供的语义分割图。 + +- 添加:`seg_fields`,`gt_seg_map` + +### 预处理 + +`RandomResize`:随机调整图像和分割图大小。 + +-添加:`scale`,`scale_factor`,`keep_ratio` +-更新:`img`,`img_shape`,`gt_seg_map` + +`Resize`:调整图像和分割图的大小。 + +-添加:`scale`,`scale_factor`,`keep_ratio` +-更新:`img`,`gt_seg_map`,`img_shape` + +`RandomCrop`:随机裁剪图像和分割图。 + +-更新:`img`,`gt_seg_map`,`img_shape` + +`RandomFlip`:翻转图像和分割图。 + +-添加:`flip`,`flip_direction` +-更新:`img`,`gt_seg_map` + +`PhotoMetricDistortion`:按顺序对图像应用光度失真,每个变换的应用概率为 0.5。随机对比度的位置是第二或倒数第二(分别为下面的模式 0 或 1)。 + +``` +1.随机亮度 +2.随机对比度(模式 0) +3.将颜色从 BGR 转换为 HSV +4.随机饱和度 +5.随机色调 +6.将颜色从 HSV 转换为 BGR +7.随机对比度(模式 1) +``` + +- 更新:`img` + +### 格式修改 + +`PackSegInputs`:为语义分段打包输入数据。 + +- 添加:`inputs`,`data_sample` +- 删除:由 `meta_keys` 指定的 keys(合并到 data_sample 的 metainfo 中),所有其他 keys