6.0 KiB
Transforms
Overview of transforms
We have introduced how to build a Pipeline
in add_transforms. A Pipeline
contains a series of
transforms
. There are three main categories of transforms
in MMSelfSup:
-
Transforms about processing the data. The unique transforms in MMSelfSup are defined in processing.py, e.g.
RandomCrop
,RandomResizedCrop
andRandomGaussianBlur
. We may also use some transforms from other repositories, e.g.LoadImageFromFile
from MMCV. -
The transform wrapper for multiple views of an image. It is defined in wrappers.py.
-
The transform to pack data into a format compatible with the inputs of the algorithm. It is defined in formatting.py.
In summary, we implement these transforms
below. The last two transforms will be introduced in detail.
class | function |
---|---|
BEiTMaskGenerator |
Generate mask for image refers to BEiT |
SimMIMMaskGenerator |
Generate random block mask for each Image refers to SimMIM |
ColorJitter |
Randomly change the brightness, contrast, saturation and hue of an image |
RandomCrop |
Crop the given Image at a random location |
RandomGaussianBlur |
GaussianBlur augmentation refers to SimCLR |
RandomResizedCrop |
Crop the given image to random size and aspectratio |
RandomResizedCropAndInterpolationWithTwoPic |
Crop the given PIL Image to random size and aspect ratio with random interpolation |
RandomSolarize |
Solarization augmentation refers to BYOL |
RotationWithLabels |
Rotation prediction |
RandomPatchWithLabels |
Apply random patch augmentation to the given image |
RandomRotation |
Rotate the image by angle |
MultiView |
A wrapper for algorithms with multi-view image inputs |
PackSelfSupInputs |
Pack data into a format compatible with the inputs of an algorithm |
Introduction of MultiView
We build a wrapper named MultiView
for some algorithms e.g. MOCO, SimCLR and SwAV with multi-view image inputs. In the config file, we can
define it as:
pipeline = [
dict(type='MultiView',
num_views=2,
transforms=[
[dict(type='Resize', scale=224),]
])
]
, which means that there are two views in the pipeline.
We can also define pipeline with different views like:
pipeline = [
dict(type='MultiView',
num_views=[2, 6],
transforms=[
[
dict(type='Resize', scale=224)],
[
dict(type='Resize', scale=224),
dict(type='RandomSolarize')],
])
]
This means that there are two pipelines, which contain 2 views and 6 views, respectively. More examples can be found in imagenet_mocov1.py, imagenet_mocov2.py and imagenet_swav_mcrop-2-6.py etc.
Introduction of PackSelfSupInputs
We build a class named PackSelfSupInputs
to pack data into a format compatible with the inputs of an algorithm. This transform
is usually put at the end of the pipeline like:
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='MultiView', num_views=2, transforms=[view_pipeline]),
dict(type='PackSelfSupInputs', meta_keys=['img_path'])
]