382 lines
20 KiB
Markdown
382 lines
20 KiB
Markdown
# Config System
|
|
|
|
We incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments.
|
|
If you wish to inspect the config file, you may run `python tools/print_config.py /PATH/TO/CONFIG` to see the complete config.
|
|
You may also pass `--options xxx.yyy=zzz` to see updated config.
|
|
|
|
## Config File Structure
|
|
|
|
There are 4 basic component types under `config/_base_`, dataset, model, schedule, default_runtime.
|
|
Many methods could be easily constructed with one of each like DeepLabV3, PSPNet.
|
|
The configs that are composed by components from `_base_` are called _primitive_.
|
|
|
|
For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3.
|
|
|
|
For easy understanding, we recommend contributors to inherit from exiting methods.
|
|
For example, if some modification is made base on DeepLabV3, user may first inherit the basic DeepLabV3 structure by specifying `_base_ = ../deeplabv3/deeplabv3_r50_512x1024_40ki_cityscapes.py`, then modify the necessary fields in the config files.
|
|
|
|
If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxxnet` under `configs`,
|
|
|
|
Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation.
|
|
|
|
## Config Name Style
|
|
|
|
We follow the below style to name config files. Contributors are advised to follow the same style.
|
|
|
|
```
|
|
{model}_{backbone}_[misc]_[gpu x batch_per_gpu]_{resolution}_{schedule}_{dataset}
|
|
```
|
|
|
|
`{xxx}` is required field and `[yyy]` is optional.
|
|
|
|
- `{model}`: model type like `psp`, `deeplabv3`, etc.
|
|
- `{backbone}`: backbone type like `r50` (ResNet-50), `x101` (ResNeXt-101).
|
|
- `[misc]`: miscellaneous setting/plugins of model, e.g. `dconv`, `gcb`, `attention`, `mstrain`.
|
|
- `[gpu x batch_per_gpu]`: GPUs and samples per GPU, `8x2` is used by default.
|
|
- `{schedule}`: training schedule, `20ki` means 20k iterations.
|
|
- `{dataset}`: dataset like `cityscapes`, `voc12aug`, `ade`.
|
|
|
|
## An Example of PSPNet
|
|
|
|
To help the users have a basic idea of a complete config and the modules in a modern semantic segmentation system,
|
|
we make brief comments on the config of PSPNet using ResNet50V1c as the following.
|
|
For more detailed usage and the corresponding alternative for each modules, please refer to the API documentation.
|
|
|
|
```python
|
|
norm_cfg = dict(type='SyncBN', requires_grad=True) # Segmentation usually uses SyncBN
|
|
model = dict(
|
|
type='EncoderDecoder', # Name of segmentor
|
|
pretrained='open-mmlab://resnet50_v1c', # The ImageNet pretrained backbone to be loaded
|
|
backbone=dict(
|
|
type='ResNetV1c', # The type of backbone. Please refer to mmseg/backbone/resnet.py for details.
|
|
depth=50, # Depth of backbone. Normally 50, 101 are used.
|
|
num_stages=4, # Number of stages of backbone.
|
|
out_indices=(0, 1, 2, 3), # The index of output feature maps produced in each stages.
|
|
dilations=(1, 1, 2, 4), # The dilation rate of each layer.
|
|
strides=(1, 2, 1, 1), # The stride of each layer.
|
|
norm_cfg=dict( # The configuration of norm layer.
|
|
type='SyncBN', # Type of norm layer. Usually it is SyncBN.
|
|
requires_grad=True), # Whether to train the gamma and beta in norm
|
|
norm_eval=False, # Whether to freeze the statistics in BN
|
|
style='pytorch', # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs.
|
|
contract_dilation=True), # When dilation > 1, whether contract first layer of dilation.
|
|
decode_head=dict(
|
|
type='PSPHead', # Type of decode head. Please refer to mmseg/models/decode_heads for available options.
|
|
in_channels=2048, # Input channel of decode head.
|
|
in_index=3, # The index of feature map to select.
|
|
channels=512, # The intermediate channels of decode head.
|
|
pool_scales=(1, 2, 3, 6), # The avg pooling scales of PSPHead. Please refer to paper for details.
|
|
drop_out_ratio=0.1, # The dropout ratio before final classification layer.
|
|
num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
|
|
norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer.
|
|
align_corners=False, # The align_corners argument for resize in decoding.
|
|
loss_decode=dict( # Config of loss function for the decode_head.
|
|
type='CrossEntropyLoss', # Type of loss used for segmentation.
|
|
use_sigmoid=False, # Whether use sigmoid activation for segmentation.
|
|
loss_weight=1.0)), # Loss weight of decode head.
|
|
auxiliary_head=dict(
|
|
type='FCNHead', # Type of auxiliary head. Please refer to mmseg/models/decode_heads for available options.
|
|
in_channels=1024, # Input channel of auxiliary head.
|
|
in_index=2, # The index of feature map to select.
|
|
channels=256, # The intermediate channels of decode head.
|
|
num_convs=1, # Number of convs in FCNHead. It is usually 1 in auxiliary head.
|
|
concat_input=False, # Whether concat output of convs with input before classification layer.
|
|
drop_out_ratio=0.1, # The dropout ratio before final classification layer.
|
|
num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
|
|
norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer.
|
|
align_corners=False, # The align_corners argument for resize in decoding.
|
|
loss_decode=dict( # Config of loss function for the decode_head.
|
|
type='CrossEntropyLoss', # Type of loss used for segmentation.
|
|
use_sigmoid=False, # Whether use sigmoid activation for segmentation.
|
|
loss_weight=0.4))) # Loss weight of auxiliary head, which is usually 0.4 of decode head.
|
|
train_cfg = dict() # train_cfg is just a place holder for now.
|
|
test_cfg = dict(mode='whole') # The test mode, options are 'whole' and 'sliding'. 'whole': whole image fully-convolutional test. 'sliding': sliding crop window on the image.
|
|
dataset_type = 'CityscapesDataset' # Dataset type, this will be used to define the dataset.
|
|
data_root = 'data/cityscapes/' # Root path of data.
|
|
img_norm_cfg = dict( # Image normalization config to normalize the input images.
|
|
mean=[123.675, 116.28, 103.53], # Mean values used to pre-training the pre-trained backbone models.
|
|
std=[58.395, 57.12, 57.375], # Standard variance used to pre-training the pre-trained backbone models.
|
|
to_rgb=True) # The channel orders of image used to pre-training the pre-trained backbone models.
|
|
crop_size = (512, 1024) # The crop size during training.
|
|
train_pipeline = [ # Training pipeline.
|
|
dict(type='LoadImageFromFile'), # First pipeline to load images from file path.
|
|
dict(type='LoadAnnotations'), # Second pipeline to load annotations for current image.
|
|
dict(type='Resize', # Augmentation pipeline that resize the images and their annotations.
|
|
img_scale=(2048, 1024), # The largest scale of image.
|
|
ratio_range=(0.5, 2.0)), # The augmented scale range as ratio.
|
|
dict(type='RandomCrop', # Augmentation pipeline that randomly crop a patch from current image.
|
|
crop_size=(512, 1024), # The crop size of patch.
|
|
cat_max_ratio=0.75), # The max area ratio that could be occupied by single category.
|
|
dict(
|
|
type='RandomFlip', # Augmentation pipeline that flip the images and their annotations
|
|
flip_ratio=0.5), # The ratio or probability to flip
|
|
dict(type='PhotoMetricDistortion'), # Augmentation pipeline that distort current image with several photo metric methods.
|
|
dict(
|
|
type='Normalize', # Augmentation pipeline that normalize the input images
|
|
mean=[123.675, 116.28, 103.53], # These keys are the same of img_norm_cfg since the
|
|
std=[58.395, 57.12, 57.375], # keys of img_norm_cfg are used here as arguments
|
|
to_rgb=True),
|
|
dict(type='Pad', # Augmentation pipeline that pad the image to specified size.
|
|
size=(512, 1024), # The output size of padding.
|
|
pad_val=0, # The padding value for image.
|
|
seg_pad_val=255), # The padding value of 'gt_semantic_seg'.
|
|
dict(type='DefaultFormatBundle'), # Default format bundle to gather data in the pipeline
|
|
dict(type='Collect', # Pipeline that decides which keys in the data should be passed to the segmentor
|
|
keys=['img', 'gt_semantic_seg'])
|
|
]
|
|
test_pipeline = [
|
|
dict(type='LoadImageFromFile'), # First pipeline to load images from file path
|
|
dict(
|
|
type='MultiScaleFlipAug', # An encapsulation that encapsulates the test time augmentations
|
|
img_scale=(2048, 1024), # Decides the largest scale for testing, used for the Resize pipeline
|
|
flip=False, # Whether to flip images during testing
|
|
transforms=[
|
|
dict(type='Resize', # Use resize augmentation
|
|
keep_ratio=True), # Whether to keep the ratio between height and width, the img_scale set here will be supressed by the img_scale set above.
|
|
dict(type='RandomFlip'), # Thought RandomFlip is added in pipeline, it is not used when flip=False
|
|
dict(
|
|
type='Normalize', # Normalization config, the values are from img_norm_cfg
|
|
mean=[123.675, 116.28, 103.53],
|
|
std=[58.395, 57.12, 57.375],
|
|
to_rgb=True),
|
|
dict(type='ImageToTensor', # Convert image to tensor
|
|
keys=['img']),
|
|
dict(type='Collect', # Collect pipeline that collect necessary keys for testing.
|
|
keys=['img'])
|
|
])
|
|
]
|
|
data = dict(
|
|
samples_per_gpu=2, # Batch size of a single GPU
|
|
workers_per_gpu=2, # Worker to pre-fetch data for each single GPU
|
|
train=dict( # Train dataset config
|
|
type='CityscapesDataset', # Type of dataset, refer to mmseg/datasets/ for details.
|
|
data_root='data/cityscapes/', # The root of dataset.
|
|
img_dir='leftImg8bit/train', # The image directory of dataset.
|
|
ann_dir='gtFine/train', # The annotation directory of dataset.
|
|
pipeline=[ # pipeline, this is passed by the train_pipeline created before.
|
|
dict(type='LoadImageFromFile'),
|
|
dict(type='LoadAnnotations'),
|
|
dict(
|
|
type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),
|
|
dict(type='RandomCrop', crop_size=(512, 1024), cat_max_ratio=0.75),
|
|
dict(type='RandomFlip', flip_ratio=0.5),
|
|
dict(type='PhotoMetricDistortion'),
|
|
dict(
|
|
type='Normalize',
|
|
mean=[123.675, 116.28, 103.53],
|
|
std=[58.395, 57.12, 57.375],
|
|
to_rgb=True),
|
|
dict(type='Pad', size=(512, 1024), pad_val=0, seg_pad_val=255),
|
|
dict(type='DefaultFormatBundle'),
|
|
dict(type='Collect', keys=['img', 'gt_semantic_seg'])
|
|
]),
|
|
val=dict( # Validation dataset config
|
|
type='CityscapesDataset',
|
|
data_root='data/cityscapes/',
|
|
img_dir='leftImg8bit/val',
|
|
ann_dir='gtFine/val',
|
|
pipeline=[ # Pipeline is passed by test_pipeline created before
|
|
dict(type='LoadImageFromFile'),
|
|
dict(
|
|
type='MultiScaleFlipAug',
|
|
img_scale=(2048, 1024),
|
|
flip=False,
|
|
transforms=[
|
|
dict(type='Resize', keep_ratio=True),
|
|
dict(type='RandomFlip'),
|
|
dict(
|
|
type='Normalize',
|
|
mean=[123.675, 116.28, 103.53],
|
|
std=[58.395, 57.12, 57.375],
|
|
to_rgb=True),
|
|
dict(type='ImageToTensor', keys=['img']),
|
|
dict(type='Collect', keys=['img'])
|
|
])
|
|
]),
|
|
test=dict(
|
|
type='CityscapesDataset',
|
|
data_root='data/cityscapes/',
|
|
img_dir='leftImg8bit/val',
|
|
ann_dir='gtFine/val',
|
|
pipeline=[
|
|
dict(type='LoadImageFromFile'),
|
|
dict(
|
|
type='MultiScaleFlipAug',
|
|
img_scale=(2048, 1024),
|
|
flip=False,
|
|
transforms=[
|
|
dict(type='Resize', keep_ratio=True),
|
|
dict(type='RandomFlip'),
|
|
dict(
|
|
type='Normalize',
|
|
mean=[123.675, 116.28, 103.53],
|
|
std=[58.395, 57.12, 57.375],
|
|
to_rgb=True),
|
|
dict(type='ImageToTensor', keys=['img']),
|
|
dict(type='Collect', keys=['img'])
|
|
])
|
|
]))
|
|
log_config = dict( # config to register logger hook
|
|
interval=50, # Interval to print the log
|
|
hooks=[
|
|
# dict(type='TensorboardLoggerHook') # The Tensorboard logger is also supported
|
|
dict(type='TextLoggerHook', by_epoch=False)
|
|
])
|
|
dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set.
|
|
log_level = 'INFO' # The level of logging.
|
|
load_from = None # load models as a pre-trained model from a given path. This will not resume training.
|
|
resume_from = None # Resume checkpoints from a given path, the training will be resumed from the iteration when the checkpoint's is saved.
|
|
workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 40000 iterations according to the `runner.max_iters`.
|
|
cudnn_benchmark = True # Whether use cudnn_benchmark to speed up, which is fast for fixed input size.
|
|
optimizer = dict( # Config used to build optimizer, support all the optimizers in PyTorch whose arguments are also the same as those in PyTorch
|
|
type='SGD', # Type of optimizers, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details
|
|
lr=0.01, # Learning rate of optimizers, see detail usages of the parameters in the documentation of PyTorch
|
|
momentum=0.9, # Momentum
|
|
weight_decay=0.0005) # Weight decay of SGD
|
|
optimizer_config = dict() # Config used to build the optimizer hook, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py#L8 for implementation details.
|
|
lr_config = dict(
|
|
policy='poly', # The policy of scheduler, also support Step, CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9.
|
|
power=0.9, # The power of polynomial decay.
|
|
min_lr=0.0001, # The minimum learning rate to stable the training.
|
|
by_epoch=False) # Whethe count by epoch or not.
|
|
runner = dict(
|
|
type='IterBasedRunner', # Type of runner to use (i.e. IterBasedRunner or EpochBasedRunner)
|
|
max_iters=40000) # Total number of iterations. For EpochBasedRunner use `max_epochs`
|
|
checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation.
|
|
by_epoch=False, # Whethe count by epoch or not.
|
|
interval=4000) # The save interval.
|
|
evaluation = dict( # The config to build the evaluation hook. Please refer to mmseg/core/evaulation/eval_hook.py for details.
|
|
interval=4000, # The interval of evaluation.
|
|
metric='mIoU') # The evaluation metric.
|
|
|
|
|
|
```
|
|
|
|
## FAQ
|
|
|
|
### Ignore some fields in the base configs
|
|
|
|
Sometimes, you may set `_delete_=True` to ignore some of fields in base configs.
|
|
You may refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#inherit-from-base-config-with-ignored-fields) for simple inllustration.
|
|
|
|
In MMSegmentation, for example, to change the backbone of PSPNet with the following config.
|
|
|
|
```python
|
|
norm_cfg = dict(type='SyncBN', requires_grad=True)
|
|
model = dict(
|
|
type='MaskRCNN',
|
|
pretrained='torchvision://resnet50',
|
|
backbone=dict(
|
|
type='ResNetV1c',
|
|
depth=50,
|
|
num_stages=4,
|
|
out_indices=(0, 1, 2, 3),
|
|
dilations=(1, 1, 2, 4),
|
|
strides=(1, 2, 1, 1),
|
|
norm_cfg=norm_cfg,
|
|
norm_eval=False,
|
|
style='pytorch',
|
|
contract_dilation=True),
|
|
decode_head=dict(...),
|
|
auxiliary_head=dict(...))
|
|
```
|
|
|
|
`ResNet` and `HRNet` use different keywords to construct.
|
|
|
|
```python
|
|
_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py'
|
|
norm_cfg = dict(type='SyncBN', requires_grad=True)
|
|
model = dict(
|
|
pretrained='open-mmlab://msra/hrnetv2_w32',
|
|
backbone=dict(
|
|
_delete_=True,
|
|
type='HRNet',
|
|
norm_cfg=norm_cfg,
|
|
extra=dict(
|
|
stage1=dict(
|
|
num_modules=1,
|
|
num_branches=1,
|
|
block='BOTTLENECK',
|
|
num_blocks=(4, ),
|
|
num_channels=(64, )),
|
|
stage2=dict(
|
|
num_modules=1,
|
|
num_branches=2,
|
|
block='BASIC',
|
|
num_blocks=(4, 4),
|
|
num_channels=(32, 64)),
|
|
stage3=dict(
|
|
num_modules=4,
|
|
num_branches=3,
|
|
block='BASIC',
|
|
num_blocks=(4, 4, 4),
|
|
num_channels=(32, 64, 128)),
|
|
stage4=dict(
|
|
num_modules=3,
|
|
num_branches=4,
|
|
block='BASIC',
|
|
num_blocks=(4, 4, 4, 4),
|
|
num_channels=(32, 64, 128, 256)))),
|
|
decode_head=dict(...),
|
|
auxiliary_head=dict(...))
|
|
```
|
|
|
|
The `_delete_=True` would replace all old keys in `backbone` field with new keys new keys.
|
|
|
|
### Use intermediate variables in configs
|
|
|
|
Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets.
|
|
It's worth noting that when modifying intermediate variables in the children configs, user need to pass the intermediate variables into corresponding fields again.
|
|
For example, we would like to change multi scale strategy to train/test a PSPNet. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify.
|
|
|
|
```python
|
|
_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscapes.py'
|
|
crop_size = (512, 1024)
|
|
img_norm_cfg = dict(
|
|
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
|
|
train_pipeline = [
|
|
dict(type='LoadImageFromFile'),
|
|
dict(type='LoadAnnotations'),
|
|
dict(type='Resize', img_scale=(2048, 1024), ratio_range=(1.0, 2.0)), # change to [1., 2.]
|
|
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
|
|
dict(type='RandomFlip', flip_ratio=0.5),
|
|
dict(type='PhotoMetricDistortion'),
|
|
dict(type='Normalize', **img_norm_cfg),
|
|
dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
|
|
dict(type='DefaultFormatBundle'),
|
|
dict(type='Collect', keys=['img', 'gt_semantic_seg']),
|
|
]
|
|
test_pipeline = [
|
|
dict(type='LoadImageFromFile'),
|
|
dict(
|
|
type='MultiScaleFlipAug',
|
|
img_scale=(2048, 1024),
|
|
img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], # change to multi scale testing
|
|
flip=False,
|
|
transforms=[
|
|
dict(type='Resize', keep_ratio=True),
|
|
dict(type='RandomFlip'),
|
|
dict(type='Normalize', **img_norm_cfg),
|
|
dict(type='ImageToTensor', keys=['img']),
|
|
dict(type='Collect', keys=['img']),
|
|
])
|
|
]
|
|
data = dict(
|
|
train=dict(pipeline=train_pipeline),
|
|
val=dict(pipeline=test_pipeline),
|
|
test=dict(pipeline=test_pipeline))
|
|
```
|
|
|
|
We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`.
|
|
|
|
Similarly, if we would like to switch from `SyncBN` to `BN` or `MMSyncBN`, we need to substitute every `norm_cfg` in the config.
|
|
|
|
```python
|
|
_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py'
|
|
norm_cfg = dict(type='BN', requires_grad=True)
|
|
model = dict(
|
|
backbone=dict(norm_cfg=norm_cfg),
|
|
decode_head=dict(norm_cfg=norm_cfg),
|
|
auxiliary_head=dict(norm_cfg=norm_cfg))
|
|
```
|