[Docs] Add custom pipeline docs. (#1124)

* [Docs] Add custom pipeline docs. * Fix link. * Fix according to comments
2025-06-03 14:59:18 +08:00 · 2022-10-27 10:35:20 +08:00 · 2022-10-27 10:35:20 +08:00 · 280e916979
commit 280e916979
parent cccbedf22d
2 changed files with 116 additions and 239 deletions
--- a/docs/en/advanced_guides/pipeline.md
+++ b/docs/en/advanced_guides/pipeline.md
@ -1,127 +1,155 @@
-# Customize Data Pipeline (TODO)
+# Customize Data Pipeline

 ## Design of Data pipelines

-Following typical conventions, we use `Dataset` and `DataLoader` for data loading
-with multiple workers. Indexing `Dataset` returns a dict of data items corresponding to
-the arguments of models forward method.
+In the [new dataset tutorial](./datasets.md), we know that the dataset class use the `load_data_list` method
+to initialize the entire dataset, and we save the information of every sample to a dict.

-The data preparation pipeline and the dataset is decomposed. Usually a dataset
-defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict.
-A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform.
+Usually, to save memory usage, we only load image paths and labels in the `load_data_list`, and load full
+image content when we use them. Moreover, we may want to do some random data augmentation during picking
+samples when training. Almost all data loading, pre-processing, and formatting operations can be configured in
+MMClassification by the **data pipeline**.

-The operations are categorized into data loading, pre-processing and formatting.
+The data pipeline means how to process the sample dict when indexing a sample from the dataset. And it
+consists of a sequence of data transforms. Each data transform takes a dict as input, processes it, and outputs a
+dict for the next data transform.

-Here is an pipeline example for ResNet-50 training on ImageNet.
+Here is a data pipeline example for ResNet-50 training on ImageNet.

 ```python
-img_norm_cfg = dict(
-    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 train_pipeline = [
    dict(type='LoadImageFromFile'),
-    dict(type='RandomResizedCrop', size=224),
-    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
-    dict(type='Normalize', **img_norm_cfg),
-    dict(type='ImageToTensor', keys=['img']),
-    dict(type='ToTensor', keys=['gt_label']),
-    dict(type='Collect', keys=['img', 'gt_label'])
-]
-test_pipeline = [
-    dict(type='LoadImageFromFile'),
-    dict(type='Resize', scale=256),
-    dict(type='CenterCrop', crop_size=224),
-    dict(type='Normalize', **img_norm_cfg),
-    dict(type='ImageToTensor', keys=['img']),
-    dict(type='Collect', keys=['img'])
+    dict(type='RandomResizedCrop', scale=224),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    dict(type='PackClsInputs'),
 ]
 ```

-For each operation, we list the related dict fields that are added/updated/removed.
-At the end of the pipeline, we use `Collect` to only retain the necessary items for forward computation.
+All available data transforms in MMClassification can be found in the [data transforms docs](mmcls.datasets.transforms).

-### Data loading
+## Modify the training/test pipeline

-`LoadImageFromFile`
+The data pipeline in MMClassification is pretty flexible. You can control almost every step of the data
+preprocessing from the config file, but on the other hand, you may be confused facing so many options.

- add: img, img_shape, ori_shape
+Here is a common practice and guidance for image classification tasks.

-By default, `LoadImageFromFile` loads images from disk but it may lead to IO bottleneck for efficient small models.
-Various backends are supported by mmcv to accelerate this process. For example, if the training machines have setup
-[memcached](https://memcached.org/), we can revise the config as follows.
+### Loading

-```
-memcached_root = '/mnt/xxx/memcached_client/'
+At the beginning of a data pipeline, we usually need to load image data from the file path.
+[`LoadImageFromFile`](mmcv.transforms.LoadImageFromFile) is commonly used to do this task.
+
+```python
 train_pipeline = [
+    dict(type='LoadImageFromFile'),
+    ...
+]
+```
+
+If you want to load data from files with special formats or special locations, you can [implement a new loading
+transform](#add-new-data-transforms) and add it at the beginning of the data pipeline.
+
+### Augmentation and other processing
+
+During training, we usually need to do data augmentation to avoid overfitting. During the test, we also need to do
+some data processing like resizing and cropping. These data transforms will be placed after the loading process.
+
+Here is a simple data augmentation recipe example. It will randomly resize and crop the input image to the
+specified scale, and randomly flip the image horizontally with probability.
+
+```python
+train_pipeline = [
+    ...
+    dict(type='RandomResizedCrop', scale=224),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
+    ...
+]
+```
+
+Here is a heavy data augmentation recipe example used in [Swin-Transformer](../papers/swin_transformer.md)
+training. To align with the official implementation, it specified `pillow` as the resize backend and `bicubic`
+as the resize algorithm. Moreover, it added [`RandAugment`](mmcls.datasets.transforms.RandAugment) and
+[`RandomErasing`](mmcls.datasets.transforms.RandomErasing) as extra data augmentation method.
+
+This configuration specified every detail of the data augmentation, and you can simply copy it to your own
+config file to apply the data augmentations of the Swin-Transformer.
+
+```python
+bgr_mean = [103.53, 116.28, 123.675]
+bgr_std = [57.375, 57.12, 58.395]
+
+train_pipeline = [
+    ...
+    dict(type='RandomResizedCrop', scale=224, backend='pillow', interpolation='bicubic'),
+    dict(type='RandomFlip', prob=0.5, direction='horizontal'),
    dict(
-        type='LoadImageFromFile',
-        file_client_args=dict(
-            backend='memcached',
-            server_list_cfg=osp.join(memcached_root, 'server_list.conf'),
-            client_cfg=osp.join(memcached_root, 'client.conf'))),
+        type='RandAugment',
+        policies='timm_increasing',
+        num_policies=2,
+        total_level=10,
+        magnitude_level=9,
+        magnitude_std=0.5,
+        hparams=dict(
+            pad_val=[round(x) for x in bgr_mean], interpolation='bicubic')),
+    dict(
+        type='RandomErasing',
+        erase_prob=0.25,
+        mode='rand',
+        min_area_ratio=0.02,
+        max_area_ratio=1 / 3,
+        fill_color=bgr_mean,
+        fill_std=bgr_std),
+    ...
 ]
 ```

-More supported backends can be found in [mmcv.fileio.FileClient](https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py).
-
-### Pre-processing
-
-`Resize`
-
- add: scale, scale_idx, pad_shape, scale_factor, keep_ratio
- update: img, img_shape
-
-`RandomFlip`
-
- add: flip, flip_direction
- update: img
-
-`RandomCrop`
-
- update: img, pad_shape
-
-`Normalize`
-
- add: img_norm_cfg
- update: img
+```{note}
+Usually, the data augmentation part in the data pipeline handles only image-wise transforms, but not transforms
+like image normalization or mixup/cutmix. It's because we can do image normalization and mixup/cutmix on batch data
+to accelerate. To configure image normalization and mixup/cutmix, please use the [data preprocessor]
+(mmcls.models.utils.data_preprocessor).
+```

 ### Formatting

-`ToTensor`
+The formatting is to collect training data from the data information dict and convert these data to
+model-friendly format.

- update: specified by `keys`.
-
-`ImageToTensor`
-
- update: specified by `keys`.
-
-`Collect`
-
- remove: all other keys except for those specified by `keys`
-
-For more information about other data transformation classes, please refer to [Data Transforms](mmcls.datasets.transforms)
-
-## Extend and use custom pipelines
-
-1. Write a new pipeline in any file, e.g., `my_pipeline.py`, and place it in
-   the folder `mmcls/datasets/pipelines/`. The pipeline class needs to override
-   the `__call__` method which takes a dict as input and returns a dict.
+In most cases, you can simply use [`PackClsInputs`](mmcls.datasets.transforms.PackClsInputs), and it will
+convert the image in NumPy array format to PyTorch tensor, and pack the ground truth categories information and
+other meta information as a [`ClsDataSample`](mmcls.structures.ClsDataSample).

 ```python
-   from mmcls.datasets import PIPELINES
+train_pipeline = [
+    ...
+    dict(type='PackClsInputs'),
+]
+```

-   @PIPELINES.register_module()
-   class MyTransform(object):
+## Add new data transforms

-       def __call__(self, results):
-           # apply transforms on results['img']
+1. Write a new data transform in any file, e.g., `my_transform.py`, and place it in
+   the folder `mmcls/datasets/transforms/`. The data transform class needs to inherit
+   the [`mmcv.transforms.BaseTransform`](mmcv.transforms.BaseTransform) class and override
+   the `transform` method which takes a dict as input and returns a dict.
+
+   ```python
+   from mmcv.transforms import BaseTransform
+   from mmcls.datasets import TRANSFORMS
+
+   @TRANSFORMS.register_module()
+   class MyTransform(BaseTransform):
+
+       def transform(self, results):
+           # Modify the data information dict `results`.
           return results
   ```

-2. Import the new class in `mmcls/datasets/pipelines/__init__.py`.
+2. Import the new class in the `mmcls/datasets/transforms/__init__.py`.

   ```python
   ...
-   from .my_pipeline import MyTransform
+   from .my_transform import MyTransform

   __all__ = [
       ..., 'MyTransform'
@ -131,17 +159,10 @@ For more information about other data transformation classes, please refer to [D
 3. Use it in config files.

   ```python
-   img_norm_cfg = dict(
-       mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
   train_pipeline = [
-       dict(type='LoadImageFromFile'),
-       dict(type='RandomResizedCrop', size=224),
-       dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
+       ...
       dict(type='MyTransform'),
-       dict(type='Normalize', **img_norm_cfg),
-       dict(type='ImageToTensor', keys=['img']),
-       dict(type='ToTensor', keys=['gt_label']),
-       dict(type='Collect', keys=['img', 'gt_label'])
+       ...
   ]
   ```

--- a/docs/zh_CN/advanced_guides/pipeline.md
+++ b/docs/zh_CN/advanced_guides/pipeline.md
@ -1,148 +1,4 @@
 # 自定义数据处理流程（待更新）

-## 设计数据流水线
-
-按照典型的用法，我们通过 `Dataset` 和 `DataLoader` 来使用多个 worker 进行数据加
-载。对 `Dataset` 的索引操作将返回一个与模型的 `forward` 方法的参数相对应的字典。
-
-数据流水线和数据集在这里是解耦的。通常，数据集定义如何处理标注文件，而数据流水
-线定义所有准备数据字典的步骤。流水线由一系列操作组成。每个操作都将一个字典作为
-输入，并输出一个字典。
-
-这些操作分为数据加载，预处理和格式化。
-
-这里使用 ResNet-50 在 ImageNet 数据集上的数据流水线作为示例。
-
-```python
-img_norm_cfg = dict(
-    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
-train_pipeline = [
-    dict(type='LoadImageFromFile'),
-    dict(type='RandomResizedCrop', size=224),
-    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
-    dict(type='Normalize', **img_norm_cfg),
-    dict(type='ImageToTensor', keys=['img']),
-    dict(type='ToTensor', keys=['gt_label']),
-    dict(type='Collect', keys=['img', 'gt_label'])
-]
-test_pipeline = [
-    dict(type='LoadImageFromFile'),
-    dict(type='Resize', scale=256),
-    dict(type='CenterCrop', crop_size=224),
-    dict(type='Normalize', **img_norm_cfg),
-    dict(type='ImageToTensor', keys=['img']),
-    dict(type='Collect', keys=['img'])
-]
-```
-
-对于每个操作，我们列出了添加、更新、删除的相关字典字段。在流水线的最后，我们使
-用 `Collect` 仅保留进行模型 `forward` 方法所需的项。
-
-### 数据加载
-
-`LoadImageFromFile` - 从文件中加载图像
-
- 添加：img, img_shape, ori_shape
-
-默认情况下，`LoadImageFromFile` 将会直接从硬盘加载图像，但对于一些效率较高、规
-模较小的模型，这可能会导致 IO 瓶颈。MMCV 支持多种数据加载后端来加速这一过程。例
-如，如果训练设备上配置了 [memcached](https://memcached.org/)，那么我们按照如下
-方式修改配置文件。
-
-```
-memcached_root = '/mnt/xxx/memcached_client/'
-train_pipeline = [
-    dict(
-        type='LoadImageFromFile',
-        file_client_args=dict(
-            backend='memcached',
-            server_list_cfg=osp.join(memcached_root, 'server_list.conf'),
-            client_cfg=osp.join(memcached_root, 'client.conf'))),
-]
-```
-
-更多支持的数据加载后端，可以参见 [mmcv.fileio.FileClient](https://github.com/open-mmlab/mmcv/blob/master/mmcv/fileio/file_client.py)。
-
-### 预处理
-
-`Resize` - 缩放图像尺寸
-
- 添加：scale, scale_idx, pad_shape, scale_factor, keep_ratio
- 更新：img, img_shape
-
-`RandomFlip` - 随机翻转图像
-
- 添加：flip, flip_direction
- 更新：img
-
-`RandomCrop` - 随机裁剪图像
-
- 更新：img, pad_shape
-
-`Normalize` - 图像数据归一化
-
- 添加：img_norm_cfg
- 更新：img
-
-### 格式化
-
-`ToTensor` - 转换（标签）数据至 `torch.Tensor`
-
- 更新：根据参数 `keys` 指定
-
-`ImageToTensor` - 转换图像数据至 `torch.Tensor`
-
- 更新：根据参数 `keys` 指定
-
-`Collect` - 保留指定键值
-
- 删除：除了参数 `keys` 指定以外的所有键值对
-
-## 扩展及使用自定义流水线
-
-1. 编写一个新的数据处理操作，并放置在 `mmcls/datasets/pipelines/` 目录下的任何
-   一个文件中，例如 `my_pipeline.py`。这个类需要重载 `__call__` 方法，接受一个
-   字典作为输入，并返回一个字典。
-
-   ```python
-   from mmcls.datasets import PIPELINES
-
-   @PIPELINES.register_module()
-   class MyTransform(object):
-
-       def __call__(self, results):
-           # 对 results['img'] 进行变换操作
-           return results
-   ```
-
-2. 在 `mmcls/datasets/pipelines/__init__.py` 中导入这个新的类。
-
-   ```python
-   ...
-   from .my_pipeline import MyTransform
-
-   __all__ = [
-       ..., 'MyTransform'
-   ]
-   ```
-
-3. 在数据流水线的配置中添加这一操作。
-
-   ```python
-   img_norm_cfg = dict(
-       mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
-   train_pipeline = [
-       dict(type='LoadImageFromFile'),
-       dict(type='RandomResizedCrop', size=224),
-       dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
-       dict(type='MyTransform'),
-       dict(type='Normalize', **img_norm_cfg),
-       dict(type='ImageToTensor', keys=['img']),
-       dict(type='ToTensor', keys=['gt_label']),
-       dict(type='Collect', keys=['img', 'gt_label'])
-   ]
-   ```
-
-## 流水线可视化
-
-设计好数据流水线后，可以使用[可视化工具](../user_guides/visualization.md)查看具体的效果。
+请参见[英文文档](https://mmclassification.readthedocs.io/en/dev-1.x/advanced_guides/pipeline.html)，如果你有兴
+趣参与中文文档的翻译，欢迎在 [讨论区](https://github.com/open-mmlab/mmclassification/discussions/1027)进行报名。