mmselfsup/docs/tutorials/2_data_pipeline.md

# Tutorial 2: Customize Data Pipelines

- [Tutorial 2: Customize Data Pipelines](#tutorial-2-customize-data-pipelines)
  - [Overview of `Pipeline`](#overview-of-pipeline)
  - [Creating new augmentations in `Pipeline`](#creating-new-augmentations-in-pipeline)

## Overview of `Pipeline`

`DataSource` and `Pipeline` are two important components in `Dataset`. We have introduced `DataSource` in [add_new_dataset](./1_new_dataset.md). And the `Pipeline` is responsible for applying a series of data augmentations to images, such as random flip.

Here is a config example of `Pipeline` for `SimCLR` training:

```py
train_pipeline = [
    dict(type='RandomResizedCrop', size=224),
    dict(type='RandomHorizontalFlip'),
    dict(
        type='RandomAppliedTrans',
        transforms=[
            dict(
                type='ColorJitter',
                brightness=0.8,
                contrast=0.8,
                saturation=0.8,
                hue=0.2)
        ],
        p=0.8),
    dict(type='RandomGrayscale', p=0.2),
    dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)
]
```

Every augmentation in the `Pipeline` receives an image as input and outputs an augmented image.

## Creating new augmentations in `Pipeline`

1.Write a new transformation function in [transforms.py](../../mmselfsup/datasets/pipelines/transforms.py) and overwrite the `__call__` function, which takes a `Pillow` image as input:

```py
@PIPELINES.register_module()
class MyTransform(object):

    def __call__(self, img):
        # apply transforms on img
        return img
```

2.Use it in config files. We reuse the config file shown above and add `MyTransform` to it.

```py
train_pipeline = [
    dict(type='RandomResizedCrop', size=224),
    dict(type='RandomHorizontalFlip'),
    dict(type='MyTransform'),
    dict(
        type='RandomAppliedTrans',
        transforms=[
            dict(
                type='ColorJitter',
                brightness=0.8,
                contrast=0.8,
                saturation=0.8,
                hue=0.2)
        ],
        p=0.8),
    dict(type='RandomGrayscale', p=0.2),
    dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)
]
```
[Feature]: Add docs and docker 2021-12-15 19:06:36 +08:00			`# Tutorial 2: Customize Data Pipelines`

			`- [Tutorial 2: Customize Data Pipelines](#tutorial-2-customize-data-pipelines)`
			- [Overview of `Pipeline`](#overview-of-pipeline)
			- [Creating new augmentations in `Pipeline`](#creating-new-augmentations-in-pipeline)

			## Overview of `Pipeline`

[Fix]: Fix empty link bug (#14) 2021-12-15 21:53:12 +08:00			`DataSource` and `Pipeline` are two important components in `Dataset`. We have introduced `DataSource` in [add_new_dataset](./1_new_dataset.md). And the `Pipeline` is responsible for applying a series of data augmentations to images, such as random flip.
[Feature]: Add docs and docker 2021-12-15 19:06:36 +08:00
			Here is a config example of `Pipeline` for `SimCLR` training:

			```py
			`train_pipeline = [`
			`dict(type='RandomResizedCrop', size=224),`
			`dict(type='RandomHorizontalFlip'),`
			`dict(`
			`type='RandomAppliedTrans',`
			`transforms=[`
			`dict(`
			`type='ColorJitter',`
			`brightness=0.8,`
			`contrast=0.8,`
			`saturation=0.8,`
			`hue=0.2)`
			`],`
			`p=0.8),`
			`dict(type='RandomGrayscale', p=0.2),`
			`dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)`
			`]`
			```

			Every augmentation in the `Pipeline` receives an image as input and outputs an augmented image.

			## Creating new augmentations in `Pipeline`

			1.Write a new transformation function in [transforms.py](../../mmselfsup/datasets/pipelines/transforms.py) and overwrite the `__call__` function, which takes a `Pillow` image as input:

			```py
			`@PIPELINES.register_module()`
			`class MyTransform(object):`

			`def __call__(self, img):`
			`# apply transforms on img`
			`return img`
			```

			2.Use it in config files. We reuse the config file shown above and add `MyTransform` to it.

			```py
			`train_pipeline = [`
			`dict(type='RandomResizedCrop', size=224),`
			`dict(type='RandomHorizontalFlip'),`
			`dict(type='MyTransform'),`
			`dict(`
			`type='RandomAppliedTrans',`
			`transforms=[`
			`dict(`
			`type='ColorJitter',`
			`brightness=0.8,`
			`contrast=0.8,`
			`saturation=0.8,`
			`hue=0.2)`
			`],`
			`p=0.8),`
			`dict(type='RandomGrayscale', p=0.2),`
			`dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, p=0.5)`
			`]`
			```