[Docs] Add custom pipeline docs. (#1124)

* [Docs] Add custom pipeline docs. * Fix link. * Fix according to comments
2022-10-27 10:35:20 +08:00 · 2022-10-27 10:35:20 +08:00 · 280e916979
parent cccbedf22d
commit 280e916979
2 changed files with 116 additions and 239 deletions
--- a/docs/en/advanced_guides/pipeline.md
+++ b/docs/en/advanced_guides/pipeline.md
@ -1,127 +1,155 @@
-# Customize Data Pipeline (TODO)
+# Customize Data Pipeline
 ## Design of Data pipelines
-Following typical conventions, we use `Dataset` and `DataLoader` for data loading
+In the [new dataset tutorial](./datasets.md), we know that the dataset class use the `load_data_list` method
-with multiple workers. Indexing `Dataset` returns a dict of data items corresponding to
+to initialize the entire dataset, and we save the information of every sample to a dict.
 the arguments of models forward method.
-The data preparation pipeline and the dataset is decomposed. Usually a dataset
+Usually, to save memory usage, we only load image paths and labels in the `load_data_list`, and load full
-defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
+image content when we use them. Moreover, we may want to do some random data augmentation during picking
-A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
+samples when training. Almost all data loading, pre-processing, and formatting operations can be configured in
 MMClassification by the **data pipeline**.
-The operations are categorized into data loading, pre-processing and formatting.
+The data pipeline means how to process the sample dict when indexing a sample from the dataset. And it
 consists of a sequence of data transforms. Each data transform takes a dict as input, processes it, and outputs a
 dict for the next data transform.
-Here is an pipeline example for ResNet-50 training on ImageNet.
+Here is a data pipeline example for ResNet-50 training on ImageNet.
 ```python
 img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 train_pipeline = [
    dict(type='LoadImageFromFile'),
-    dict(type='RandomResizedCrop', size=224),
+    dict(type='RandomResizedCrop', scale=224),
-    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
-    dict(type='Normalize', **img_norm_cfg),
+    dict(type='PackClsInputs'),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_label']),
    dict(type='Collect', keys=['img', 'gt_label'])
 ]
 test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', scale=256),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img'])
 ]
 ```
-For each operation, we list the related dict fields that are added/updated/removed.
+All available data transforms in MMClassification can be found in the [data transforms docs](mmcls.datasets.transforms).
 At the end of the pipeline, we use `Collect` to only retain the necessary items for forward computation.
-### Data loading
+## Modify the training/test pipeline
-`LoadImageFromFile`
+The data pipeline in MMClassification is pretty flexible. You can control almost every step of the data
 preprocessing from the config file, but on the other hand, you may be confused facing so many options.
- add: img, img_shape, ori_shape
+Here is a common practice and guidance for image classification tasks.
-By default, `LoadImageFromFile` loads images from disk but it may lead to IO bottleneck for efficient small models.
+### Loading
 Various backends are supported by mmcv to accelerate this process. For example, if the training machines have setup
 [memcached](https://memcached.org/), we can revise the config as follows.
-```
+At the beginning of a data pipeline, we usually need to load image data from the file path.
-memcached_root = '/mnt/xxx/memcached_client/'
+[`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) is commonly used to do this task.
 ```python
 train_pipeline = [
    dict(type='LoadImageFromFile'),
    ...
 ]
 ```
 If you want to load data from files with special formats or special locations, you can [implement a new loading
 transform](#add-new-data-transforms) and add it at the beginning of the data pipeline.
 ### Augmentation and other processing
 During training, we usually need to do data augmentation to avoid overfitting. During the test, we also need to do
 some data processing like resizing and cropping. These data transforms will be placed after the loading process.
 Here is a simple data augmentation recipe example. It will randomly resize and crop the input image to the
 specified scale, and randomly flip the image horizontally with probability.
 ```python
 train_pipeline = [
    ...
    dict(type='RandomResizedCrop', scale=224),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    ...
 ]
 ```
 Here is a heavy data augmentation recipe example used in [Swin-Transformer](../papers/swin_transformer.md)
 training. To align with the official implementation, it specified `pillow` as the resize backend and `bicubic`
 as the resize algorithm. Moreover, it added [`RandAugment`](mmcls.datasets.transforms.RandAugment) and
 [`RandomErasing`](mmcls.datasets.transforms.RandomErasing) as extra data augmentation method.
 This configuration specified every detail of the data augmentation, and you can simply copy it to your own
 config file to apply the data augmentations of the Swin-Transformer.
 ```python
 bgr_mean = [103.53, 116.28, 123.675]
 bgr_std = [57.375, 57.12, 58.395]
 train_pipeline = [
    ...
    dict(type='RandomResizedCrop', scale=224, backend='pillow', interpolation='bicubic'),
    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(
-        type='LoadImageFromFile',
+        type='RandAugment',
-        file_client_args=dict(
+        policies='timm_increasing',
-            backend='memcached',
+        num_policies=2,
-            server_list_cfg=osp.join(memcached_root, 'server_list.conf'),
+        total_level=10,
-            client_cfg=osp.join(memcached_root, 'client.conf'))),
+        magnitude_level=9,
        magnitude_std=0.5,
        hparams=dict(
            pad_val=[round(x) for x in bgr_mean], interpolation='bicubic')),
    dict(
        type='RandomErasing',
        erase_prob=0.25,
        mode='rand',
        min_area_ratio=0.02,
        max_area_ratio=1 / 3,
        fill_color=bgr_mean,
        fill_std=bgr_std),
    ...
 ]
 ```
-More supported backends can be found in [mmcv.fileio.FileClient](https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py).
+```{note}
-
+Usually, the data augmentation part in the data pipeline handles only image-wise transforms, but not transforms
-### Pre-processing
+like image normalization or mixup/cutmix. It's because we can do image normalization and mixup/cutmix on batch data
-
+to accelerate. To configure image normalization and mixup/cutmix, please use the [data preprocessor]
-`Resize`
+(mmcls.models.utils.data_preprocessor).
-
+```
 - add: scale, scale_idx, pad_shape, scale_factor, keep_ratio
 - update: img, img_shape
 `RandomFlip`
 - add: flip, flip_direction
 - update: img
 `RandomCrop`
 - update: img, pad_shape
 `Normalize`
 - add: img_norm_cfg
 - update: img
 ### Formatting
-`ToTensor`
+The formatting is to collect training data from the data information dict and convert these data to
 model-friendly format.
- update: specified by `keys`.
+In most cases, you can simply use [`PackClsInputs`](mmcls.datasets.transforms.PackClsInputs), and it will
 convert the image in NumPy array format to PyTorch tensor, and pack the ground truth categories information and
 other meta information as a [`ClsDataSample`](mmcls.structures.ClsDataSample).
-`ImageToTensor`
+```python
 train_pipeline = [
    ...
    dict(type='PackClsInputs'),
 ]
 ```
- update: specified by `keys`.
+## Add new data transforms
-`Collect`
+1. Write a new data transform in any file, e.g., `my_transform.py`, and place it in
-
+   the folder `mmcls/datasets/transforms/`. The data transform class needs to inherit
- remove: all other keys except for those specified by `keys`
+   the [`mmcv.transforms.BaseTransform`](mmcv.transforms.BaseTransform) class and override
-
+   the `transform` method which takes a dict as input and returns a dict.
 For more information about other data transformation classes, please refer to [Data Transforms](mmcls.datasets.transforms)
 ## Extend and use custom pipelines
 1. Write a new pipeline in any file, e.g., `my_pipeline.py`, and place it in
   the folder `mmcls/datasets/pipelines/`. The pipeline class needs to override
   the `__call__` method which takes a dict as input and returns a dict.
   ```python
-   from mmcls.datasets import PIPELINES
+   from mmcv.transforms import BaseTransform
   from mmcls.datasets import TRANSFORMS
-   @PIPELINES.register_module()
+   @TRANSFORMS.register_module()
-   class MyTransform(object):
+   class MyTransform(BaseTransform):
-       def __call__(self, results):
+       def transform(self, results):
-           # apply transforms on results['img']
+           # Modify the data information dict `results`.
           return results
   ```
-2. Import the new class in `mmcls/datasets/pipelines/__init__.py`.
+2. Import the new class in the `mmcls/datasets/transforms/__init__.py`.
   ```python
   ...
-   from .my_pipeline import MyTransform
+   from .my_transform import MyTransform
   __all__ = [
       ..., 'MyTransform'
@ -131,17 +159,10 @@ For more information about other data transformation classes, please refer to [D
 3. Use it in config files.
   ```python
   img_norm_cfg = dict(
       mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
   train_pipeline = [
-       dict(type='LoadImageFromFile'),
+       ...
       dict(type='RandomResizedCrop', size=224),
       dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
       dict(type='MyTransform'),
-       dict(type='Normalize', **img_norm_cfg),
+       ...
       dict(type='ImageToTensor', keys=['img']),
       dict(type='ToTensor', keys=['gt_label']),
       dict(type='Collect', keys=['img', 'gt_label'])
   ]
   ```
--- a/docs/zh_CN/advanced_guides/pipeline.md
+++ b/docs/zh_CN/advanced_guides/pipeline.md
@ -1,148 +1,4 @@
 # 自定义数据处理流程（待更新）
-## 设计数据流水线
+请参见[英文文档](https://mmclassification.readthedocs.io/en/dev-1.x/advanced_guides/pipeline.html)，如果你有兴
-
+趣参与中文文档的翻译，欢迎在 [讨论区](https://github.com/open-mmlab/mmclassification/discussions/1027)进行报名。
 按照典型的用法，我们通过 `Dataset` 和 `DataLoader` 来使用多个 worker 进行数据加
 载。对 `Dataset` 的索引操作将返回一个与模型的 `forward` 方法的参数相对应的字典。
 数据流水线和数据集在这里是解耦的。通常，数据集定义如何处理标注文件，而数据流水
 线定义所有准备数据字典的步骤。流水线由一系列操作组成。每个操作都将一个字典作为
 输入，并输出一个字典。
 这些操作分为数据加载，预处理和格式化。
 这里使用 ResNet-50 在 ImageNet 数据集上的数据流水线作为示例。
 ```python
 img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', size=224),
    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_label']),
    dict(type='Collect', keys=['img', 'gt_label'])
 ]
 test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', scale=256),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img'])
 ]
 ```
 对于每个操作，我们列出了添加、更新、删除的相关字典字段。在流水线的最后，我们使
 用 `Collect` 仅保留进行模型 `forward` 方法所需的项。
 ### 数据加载
 `LoadImageFromFile` - 从文件中加载图像
 - 添加：img, img_shape, ori_shape
 默认情况下，`LoadImageFromFile` 将会直接从硬盘加载图像，但对于一些效率较高、规
 模较小的模型，这可能会导致 IO 瓶颈。MMCV 支持多种数据加载后端来加速这一过程。例
 如，如果训练设备上配置了 [memcached](https://memcached.org/)，那么我们按照如下
 方式修改配置文件。
 ```
 memcached_root = '/mnt/xxx/memcached_client/'
 train_pipeline = [
    dict(
        type='LoadImageFromFile',
        file_client_args=dict(
            backend='memcached',
            server_list_cfg=osp.join(memcached_root, 'server_list.conf'),
            client_cfg=osp.join(memcached_root, 'client.conf'))),
 ]
 ```
 更多支持的数据加载后端，可以参见 [mmcv.fileio.FileClient](https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py)。
 ### 预处理
 `Resize` - 缩放图像尺寸
 - 添加：scale, scale_idx, pad_shape, scale_factor, keep_ratio
 - 更新：img, img_shape
 `RandomFlip` - 随机翻转图像
 - 添加：flip, flip_direction
 - 更新：img
 `RandomCrop` - 随机裁剪图像
 - 更新：img, pad_shape
 `Normalize` - 图像数据归一化
 - 添加：img_norm_cfg
 - 更新：img
 ### 格式化
 `ToTensor` - 转换（标签）数据至 `torch.Tensor`
 - 更新：根据参数 `keys` 指定
 `ImageToTensor` - 转换图像数据至 `torch.Tensor`
 - 更新：根据参数 `keys` 指定
 `Collect` - 保留指定键值
 - 删除：除了参数 `keys` 指定以外的所有键值对
 ## 扩展及使用自定义流水线
 1. 编写一个新的数据处理操作，并放置在 `mmcls/datasets/pipelines/` 目录下的任何
   一个文件中，例如 `my_pipeline.py`。这个类需要重载 `__call__` 方法，接受一个
   字典作为输入，并返回一个字典。
   ```python
   from mmcls.datasets import PIPELINES
   @PIPELINES.register_module()
   class MyTransform(object):
       def __call__(self, results):
           # 对 results['img'] 进行变换操作
           return results
   ```
 2. 在 `mmcls/datasets/pipelines/__init__.py` 中导入这个新的类。
   ```python
   ...
   from .my_pipeline import MyTransform
   __all__ = [
       ..., 'MyTransform'
   ]
   ```
 3. 在数据流水线的配置中添加这一操作。
   ```python
   img_norm_cfg = dict(
       mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
   train_pipeline = [
       dict(type='LoadImageFromFile'),
       dict(type='RandomResizedCrop', size=224),
       dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
       dict(type='MyTransform'),
       dict(type='Normalize', **img_norm_cfg),
       dict(type='ImageToTensor', keys=['img']),
       dict(type='ToTensor', keys=['gt_label']),
       dict(type='Collect', keys=['img', 'gt_label'])
   ]
   ```
 ## 流水线可视化
 设计好数据流水线后，可以使用[可视化工具](../user_guides/visualization.md)查看具体的效果。