# Tutorial 2: Learn about Configs We use python files as our config system. You can find all the provided configs under `$MMRazor/configs`. ## Config Name Style We follow the below convention to name config files. Contributors are advised to follow the same style. The config file names are divided into four parts: algorithm info, module information, training information and data information. Logically, different parts are concatenated by underscores. `'_'`, and words in the same part are concatenated by dashes `'-'`. ```text {algorithm info}_{model info}_[experiment setting]_{training info}_{data info}.py ``` `{xxx}` is required field and `[yyy]` is optional. - `algorithm info`: algorithm information, algorithm name, such as spos, autoslim, cwd, etc.; - `model info`: model information, model name to be slimmed, such as shufflenet, faster rcnn, etc; - `experiment setting`: optional, it is used to describe important information about algorithm or model, such as there are 3 stages in spos: pre-training supernet, search, retrain subnet, you can use it to specify which stage, you also can use it to specify teacher network and student network in KD; - `training info`: Training information, some training schedule, including batch size, lr schedule, data augment and the like; - `data info`: Data information, dataset name, input size and so on, such as imagenet, cifar, etc. ## Config System Same as MMDetection, we incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments. To help the users have a basic idea of a complete config and the modules in a generation system, we make brief comments on the configs of some examples as the following. For more detailed usage and the corresponding alternative for each modules, please refer to the API documentation. ### An example of NAS - spos ```Python _base_ = [ '../_base_/datasets/mmcls/imagenet_bs128_spos.py', # data '../_base_/schedules/mmcls/imagenet_bs1024_spos.py', # training schedule '../_base_/mmcls_runtime.py' # runtime setting ] # need to specify some parameters baesd on _base_ by rewriting evaluation = dict(interval=1000, metric='accuracy') checkpoint_config = dict(interval=10000) find_unused_parameters = True # model settings norm_cfg = dict(type='BN') model = dict( type='mmcls.ImageClassifier', # Classifier name backbone=dict( type='SearchableShuffleNetV2', # Backbones name widen_factor=1.0, norm_cfg=norm_cfg)), neck=dict(type='GlobalAveragePooling'), # neck network name head=dict( type='LinearClsHead', # linear classification head num_classes=1000, # The number of output categories, consistent with the number of categories in the dataset in_channels=1024, # The number of input channels, consistent with the output channel of the neck loss=dict( # Loss function configuration information type='LabelSmoothLoss', num_classes=1000, label_smooth_val=0.1, mode='original', loss_weight=1.0), topk=(1, 5), # Evaluation index, Top-k accuracy rate, here is the accuracy rate of top1 and top5 ), ) # mutator settings mutator = dict( type='OneShotMutator', # mutator name registered placeholder_mapping=dict( # specify mapping dict for placeholders in the architecture all_blocks=dict( # key: placeholder block name according to the architecture; value: specify mutable to replace the placeholder type='OneShotOP', # mutable name registered choices=dict( shuffle_3x3=dict( type='ShuffleBlock', kernel_size=3, norm_cfg=norm_cfg), shuffle_5x5=dict( type='ShuffleBlock', kernel_size=5, norm_cfg=norm_cfg), shuffle_7x7=dict( type='ShuffleBlock', kernel_size=7, norm_cfg=norm_cfg), shuffle_xception=dict( type='ShuffleXception', norm_cfg=norm_cfg), )))) # algorithm settings algorithm = dict( type='SPOS', # algorithm name registered architecture=dict( # architecture setting type='MMClsArchitecture', # architecture name registered model=model, # specify defined model to use in the architecture ), mutator=mutator, # specify defined mutator to use in the algorithm distiller=None, # specify the distiller in the algorithm, default None retraining=False, # Bool, specify which stage in the algorithm. True: sunet retrain; False: pre-training supernet ) ``` ### An example of KD - cwd ```Python _base_ = [ '../_base_/datasets/mmseg/cityscapes.py', # data '../_base_/schedules/mmseg/schedule_80k.py', # training schedule '../_base_/mmseg_runtime.py' # runtime setting ] # specify norm_cfg for teacher and student as follows norm_cfg = dict(type='SyncBN', requires_grad=True) # pspnet r18 as student network, for more detailed usage, please refer to MMSegmentation's docs student = dict( type='mmseg.EncoderDecoder', backbone=dict( type='ResNetV1c', init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://resnet18_v1c'), depth=18, num_stages=4, out_indices=(0, 1, 2, 3), dilations=(1, 1, 2, 4), strides=(1, 2, 1, 1), norm_cfg=norm_cfg, norm_eval=False, style='pytorch', contract_dilation=True), decode_head=dict( type='PSPHead', in_channels=512, in_index=3, channels=128, pool_scales=(1, 2, 3, 6), dropout_ratio=0.1, num_classes=19, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), auxiliary_head=dict( type='FCNHead', in_channels=256, in_index=2, channels=64, num_convs=1, concat_input=False, dropout_ratio=0.1, num_classes=19, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), train_cfg=dict(), test_cfg=dict(mode='whole')) # pspnet r101 as teacher network, for more detailed usage, please refer to MMSegmentation's docs teacher = dict( type='mmseg.EncoderDecoder', backbone=dict( type='ResNetV1c', depth=101, num_stages=4, out_indices=(0, 1, 2, 3), dilations=(1, 1, 2, 4), strides=(1, 2, 1, 1), norm_cfg=norm_cfg, norm_eval=False, style='pytorch', contract_dilation=True), decode_head=dict( type='PSPHead', in_channels=2048, in_index=3, channels=512, pool_scales=(1, 2, 3, 6), dropout_ratio=0.1, num_classes=19, norm_cfg=norm_cfg, align_corners=False, loss_decode=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), ) # distiller settings distiller=dict( type='SingleTeacherDistiller', # distiller name registered teacher=teacher, # specify defined teacher to use in the distiller teacher_trainable=False, # whether to train teacher components=[ # specify what moudules to calculate kd-loss in teacher and student dict( student_module='decode_head.conv_seg', # student module name teacher_module='decode_head.conv_seg', # teacher module name losses=[ # specify kd-loss dict( type='ChannelWiseDivergence', # kd-loss type name='loss_cwd_logits', # name this loss in order to easy get the output of this loss tau=5, # temperature coefficient weight=3, # weight of this loss ) ]) ]), # algorithm settings algorithm = dict( type='Distillation', # algorithm name registered architecture=dict( # architecture setting type='MMSegArchitecture', # architecture name registered model=student, # specify defined student as the model of architecture ), use_gt=True, # whether to calculate gt_loss with gt distiller=distiller, # specify defined distiller to use in the algorithm ) ``` ### An example of pruning - autoslim ```Python _base_ = [ '../_base_/datasets/mmcls/imagenet_bs256_autoslim.py', # data '../_base_/schedules/mmcls/imagenet_bs2048_autoslim.py', # training schedule '../_base_/mmcls_runtime.py' # runtime setting ] # need to specify some parameters baesd on _base_ by rewriting runner = dict(type='EpochBasedRunner', max_epochs=50) # model settings model = dict( type='mmcls.ImageClassifier', # Classifier name backbone=dict(type='MobileNetV2', widen_factor=1.5), # Backbones name neck=dict(type='GlobalAveragePooling'), # neck network name head=dict( type='LinearClsHead', # linear classification head num_classes=1000, # The number of output categories, consistent with the number of categories in the dataset in_channels=1920, # The number of input channels, consistent with the output channel of the neck loss=dict( # Loss function configuration information type='CrossEntropyLoss', loss_weight=1.0), topk=(1, 5), # Evaluation index, Top-k accuracy rate, here is the accuracy rate of top1 and top5 )) # distiller settings, for more details, please refer to the previous section: an example of KD - cwd distiller = dict( type='SelfDistiller', components=[ dict( student_module='head.fc', teacher_module='head.fc', losses=[ dict( type='KLDivergence', name='loss_kd', tau=1, weight=1, ) ]), ]) # pruner settings pruner=dict( type='RatioPruner', # pruner name registered ratios=(2 / 12, 3 / 12, 4 / 12, 5 / 12, # list, specify the ratio range of random sampling 6 / 12, 7 / 12, 8 / 12, 9 / 12, 10 / 12, 11 / 12, 1.0)) # algorithm settings algorithm = dict( type='AutoSlim', # algorithm name registered architecture=dict( # architecture setting type='MMClsArchitecture', # architecture name registered model=model), # specify defined model to use in the architecture distiller=distiller, # specify defined distiller to use in the algorithm pruner=pruner, # specify defined pruner to use in the algorithm retraining=False, # Bool, specify which stage in the algorithm. True: sunet retrain; False: pre-training supernet bn_training_mode=True, # set bn to training mode when model is set to eval mode input_shape=None) # setting input_shape for getting subnet flops use_ddp_wrapper = True # bool, for updating optimizer in train_step to avoid error ```