mmcv/docs/en/understand_mmcv/cnn.md

## CNN

We provide some building bricks for CNNs, including layer building, module bundles and weight initialization.

### Layer building

We may need to try different layers of the same type when running experiments,
but do not want to modify the code from time to time.
Here we provide some layer building methods to construct layers from a dict,
which can be written in configs or specified via command line arguments.

#### Usage

A simplest example is

```python
cfg = dict(type='Conv3d')
layer = build_conv_layer(cfg, in_channels=3, out_channels=8, kernel_size=3)
```

- `build_conv_layer`: Supported types are Conv1d, Conv2d, Conv3d, Conv (alias for Conv2d).
- `build_norm_layer`: Supported types are BN1d, BN2d, BN3d, BN (alias for BN2d), SyncBN, GN, LN, IN1d, IN2d, IN3d, IN (alias for IN2d).
- `build_activation_layer`: Supported types are ReLU, LeakyReLU, PReLU, RReLU, ReLU6, ELU, Sigmoid, Tanh, GELU.
- `build_upsample_layer`: Supported types are nearest, bilinear, deconv, pixel_shuffle.
- `build_padding_layer`: Supported types are zero, reflect, replicate.

#### Extension

We also allow extending the building methods with custom layers and operators.

1. Write and register your own module.

   ```python
   from mmcv.cnn import UPSAMPLE_LAYERS

   @UPSAMPLE_LAYERS.register_module()
   class MyUpsample:

       def __init__(self, scale_factor):
           pass

       def forward(self, x):
           pass
   ```

2. Import `MyUpsample` somewhere (e.g., in `__init__.py`) and then use it.

   ```python
   cfg = dict(type='MyUpsample', scale_factor=2)
   layer = build_upsample_layer(cfg)
   ```

### Module bundles

We also provide common module bundles to facilitate the network construction.
`ConvModule` is a bundle of convolution, normalization and activation layers,
please refer to the [api](api.html#mmcv.cnn.ConvModule) for details.

```python
# conv + bn + relu
conv = ConvModule(3, 8, 2, norm_cfg=dict(type='BN'))
# conv + gn + relu
conv = ConvModule(3, 8, 2, norm_cfg=dict(type='GN', num_groups=2))
# conv + relu
conv = ConvModule(3, 8, 2)
# conv
conv = ConvModule(3, 8, 2, act_cfg=None)
# conv + leaky relu
conv = ConvModule(3, 8, 3, padding=1, act_cfg=dict(type='LeakyReLU'))
# bn + conv + relu
conv = ConvModule(
    3, 8, 2, norm_cfg=dict(type='BN'), order=('norm', 'conv', 'act'))
```

### Weight initialization

> Implementation details are available at [mmcv/cnn/utils/weight_init.py](../../mmcv/cnn/utils/weight_init.py)

During training, a proper initialization strategy is beneficial to speed up the
training or obtain a higher performance. In MMCV, we provide some commonly used
methods for initializing modules like `nn.Conv2d`. Of course, we also provide
high-level APIs for initializing models containing one or more
modules.

#### Initialization functions

Initialize a `nn.Module` such as `nn.Conv2d`, `nn.Linear` in a functional way.

We provide the following initialization methods.

- constant_init

  Initialize module parameters with constant values.

  ```python
  >>> import torch.nn as nn
  >>> from mmcv.cnn import constant_init
  >>> conv1 = nn.Conv2d(3, 3, 1)
  >>> # constant_init(module, val, bias=0)
  >>> constant_init(conv1, 1, 0)
  >>> conv1.weight
  ```

- xavier_init

  Initialize module parameters with values according to the method
  described in [Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010)](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)

  ```python
  >>> import torch.nn as nn
  >>> from mmcv.cnn import xavier_init
  >>> conv1 = nn.Conv2d(3, 3, 1)
  >>> # xavier_init(module, gain=1, bias=0, distribution='normal')
  >>> xavier_init(conv1, distribution='normal')
  ```

- normal_init

  Initialize module parameters with the values drawn from a normal distribution.

  ```python
  >>> import torch.nn as nn
  >>> from mmcv.cnn import normal_init
  >>> conv1 = nn.Conv2d(3, 3, 1)
  >>> # normal_init(module, mean=0, std=1, bias=0)
  >>> normal_init(conv1, std=0.01, bias=0)
  ```

- uniform_init

  Initialize module parameters with values drawn from a uniform distribution.

  ```python
  >>> import torch.nn as nn
  >>> from mmcv.cnn import uniform_init
  >>> conv1 = nn.Conv2d(3, 3, 1)
  >>> # uniform_init(module, a=0, b=1, bias=0)
  >>> uniform_init(conv1, a=0, b=1)
  ```

- kaiming_init

  Initialize module parameters with the values according to the method
  described in [Delving deep into rectifiers: Surpassing human-level
  performance on ImageNet classification - He, K. et al. (2015)](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf)

  ```python
  >>> import torch.nn as nn
  >>> from mmcv.cnn import kaiming_init
  >>> conv1 = nn.Conv2d(3, 3, 1)
  >>> # kaiming_init(module, a=0, mode='fan_out', nonlinearity='relu', bias=0, distribution='normal')
  >>> kaiming_init(conv1)
  ```

- caffe2_xavier_init

  The xavier initialization is implemented in caffe2, which corresponds to `kaiming_uniform_` in PyTorch.

  ```python
  >>> import torch.nn as nn
  >>> from mmcv.cnn import caffe2_xavier_init
  >>> conv1 = nn.Conv2d(3, 3, 1)
  >>> # caffe2_xavier_init(module, bias=0)
  >>> caffe2_xavier_init(conv1)
  ```

- bias_init_with_prob

  Initialize conv/fc bias value according to a given probability, as proposed in [Focal Loss for Dense Object Detection](https://arxiv.org/pdf/1708.02002.pdf).

  ```python
  >>> from mmcv.cnn import bias_init_with_prob
  >>> # bias_init_with_prob is proposed in Focal Loss
  >>> bias = bias_init_with_prob(0.01)
  >>> bias
  -4.59511985013459
  ```

#### Initializers and configs

On the basis of the initialization methods, we define the corresponding initialization classes and register them to `INITIALIZERS`, so we can
use the configuration to initialize the model.

We provide the following initialization classes.

- ConstantInit
- XavierInit
- NormalInit
- UniformInit
- KaimingInit
- Caffe2XavierInit
- PretrainedInit

Let us introduce the usage of `initialize` in detail.

1. Initialize model by `layer` key

   If we only define `layer`, it just initialize the layer in `layer` key.

   NOTE: Value of `layer` key is the class name with attributes weights and bias of Pytorch, so `MultiheadAttention layer` is not supported.

- Define `layer` key for initializing module with same configuration.

  ```python
  import torch.nn as nn
  from mmcv.cnn import initialize

  class FooNet(nn.Module):
      def __init__(self):
          super().__init__()
          self.feat = nn.Conv1d(3, 1, 3)
          self.reg = nn.Conv2d(3, 3, 3)
          self.cls = nn.Linear(1, 2)

  model = FooNet()
  init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d', 'Linear'], val=1)
  # initialize whole module with same configuration
  initialize(model, init_cfg)
  # model.feat.weight
  # Parameter containing:
  # tensor([[[1., 1., 1.],
  #          [1., 1., 1.],
  #          [1., 1., 1.]]], requires_grad=True)
  ```

- Define `layer` key for initializing layer with different configurations.

  ```python
  import torch.nn as nn
  from mmcv.cnn.utils import initialize

  class FooNet(nn.Module):
      def __init__(self):
          super().__init__()
          self.feat = nn.Conv1d(3, 1, 3)
          self.reg = nn.Conv2d(3, 3, 3)
          self.cls = nn.Linear(1,2)

  model = FooNet()
  init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
              dict(type='Constant', layer='Conv2d', val=2),
              dict(type='Constant', layer='Linear', val=3)]
  # nn.Conv1d will be initialized with dict(type='Constant', val=1)
  # nn.Conv2d will be initialized with dict(type='Constant', val=2)
  # nn.Linear will be initialized with dict(type='Constant', val=3)
  initialize(model, init_cfg)
  # model.reg.weight
  # Parameter containing:
  # tensor([[[[2., 2., 2.],
  #           [2., 2., 2.],
  #           [2., 2., 2.]],
  #          ...,
  #          [[2., 2., 2.],
  #           [2., 2., 2.],
  #           [2., 2., 2.]]]], requires_grad=True)
  ```

2. Initialize model by `override` key

- When initializing some specific part with its attribute name, we can use `override` key, and the value in `override` will ignore the value in init_cfg.

  ```python
  import torch.nn as nn
  from mmcv.cnn import initialize

  class FooNet(nn.Module):
      def __init__(self):
          super().__init__()
          self.feat = nn.Conv1d(3, 1, 3)
          self.reg = nn.Conv2d(3, 3, 3)
          self.cls = nn.Sequential(nn.Conv1d(3, 1, 3), nn.Linear(1,2))

  # if we would like to initialize model's weights as 1 and bias as 2
  # but weight in `cls` as 3 and bias 4, we can use override key
  model = FooNet()
  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'], val=1, bias=2,
                  override=dict(type='Constant', name='reg', val=3, bias=4))
  # self.feat and self.cls will be initialized with dict(type='Constant', val=1, bias=2)
  # The module called 'reg' will be initialized with dict(type='Constant', val=3, bias=4)
  initialize(model, init_cfg)
  # model.reg.weight
  # Parameter containing:
  # tensor([[[[3., 3., 3.],
  #           [3., 3., 3.],
  #           [3., 3., 3.]],
  #           ...,
  #           [[3., 3., 3.],
  #            [3., 3., 3.],
  #            [3., 3., 3.]]]], requires_grad=True)
  ```

- If `layer` is None in init_cfg, only sub-module with the name in override will be initialized, and type and other args in override can be omitted.

  ```python
  model = FooNet()
  init_cfg = dict(type='Constant', val=1, bias=2, override=dict(name='reg'))
  # self.feat and self.cls will be initialized by Pytorch
  # The module called 'reg' will be initialized with dict(type='Constant', val=1, bias=2)
  initialize(model, init_cfg)
  # model.reg.weight
  # Parameter containing:
  # tensor([[[[1., 1., 1.],
  #           [1., 1., 1.],
  #           [1., 1., 1.]],
  #           ...,
  #           [[1., 1., 1.],
  #            [1., 1., 1.],
  #            [1., 1., 1.]]]], requires_grad=True)
  ```

- If we don't define `layer` key or `override` key, it will not initialize anything.

- Invalid usage

  ```python
  # It is invalid that override don't have name key
  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'],
                  val=1, bias=2,
                  override=dict(type='Constant', val=3, bias=4))

  # It is also invalid that override has name and other args except type
  init_cfg = dict(type='Constant', layer=['Conv1d','Conv2d'],
                  val=1, bias=2,
                  override=dict(name='reg', val=3, bias=4))
  ```

3. Initialize model with the pretrained model

   ```python
   import torch.nn as nn
   import torchvision.models as models
   from mmcv.cnn import initialize

   # initialize model with pretrained model
   model = models.resnet50()
   # model.conv1.weight
   # Parameter containing:
   # tensor([[[[-6.7435e-03, -2.3531e-02, -9.0143e-03,  ..., -2.1245e-03,
   #            -1.8077e-03,  3.0338e-03],
   #           [-1.2603e-02, -2.7831e-02,  2.3187e-02,  ..., -1.5793e-02,
   #             1.1655e-02,  4.5889e-03],
   #           [-3.7916e-02,  1.2014e-02,  1.3815e-02,  ..., -4.2651e-03,
   #             1.7314e-02, -9.9998e-03],
   #           ...,

   init_cfg = dict(type='Pretrained',
                   checkpoint='torchvision://resnet50')
   initialize(model, init_cfg)
   # model.conv1.weight
   # Parameter containing:
   # tensor([[[[ 1.3335e-02,  1.4664e-02, -1.5351e-02,  ..., -4.0896e-02,
   #            -4.3034e-02, -7.0755e-02],
   #           [ 4.1205e-03,  5.8477e-03,  1.4948e-02,  ...,  2.2060e-03,
   #            -2.0912e-02, -3.8517e-02],
   #           [ 2.2331e-02,  2.3595e-02,  1.6120e-02,  ...,  1.0281e-01,
   #             6.2641e-02,  5.1977e-02],
   #           ...,

   # initialize weights of a sub-module with the specific part of a pretrained model by using 'prefix'
   model = models.resnet50()
   url = 'http://download.openmmlab.com/mmdetection/v2.0/retinanet/'\
         'retinanet_r50_fpn_1x_coco/'\
         'retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'
   init_cfg = dict(type='Pretrained',
                   checkpoint=url, prefix='backbone.')
   initialize(model, init_cfg)
   ```

4. Initialize model inherited from BaseModule, Sequential, ModuleList, ModuleDict

   `BaseModule` is inherited from `torch.nn.Module`, and the only different between them is that `BaseModule` implements `init_weights()`.

   `Sequential` is inherited from `BaseModule` and `torch.nn.Sequential`.

   `ModuleList` is inherited from `BaseModule` and `torch.nn.ModuleList`.

   `ModuleDict` is inherited from `BaseModule` and `torch.nn.ModuleDict`.

   ```python
   import torch.nn as nn
   from mmcv.runner import BaseModule, Sequential, ModuleList, ModuleDict

   class FooConv1d(BaseModule):

       def __init__(self, init_cfg=None):
           super().__init__(init_cfg)
           self.conv1d = nn.Conv1d(4, 1, 4)

       def forward(self, x):
           return self.conv1d(x)

   class FooConv2d(BaseModule):

       def __init__(self, init_cfg=None):
           super().__init__(init_cfg)
           self.conv2d = nn.Conv2d(3, 1, 3)

       def forward(self, x):
           return self.conv2d(x)

   # BaseModule
   init_cfg = dict(type='Constant', layer='Conv1d', val=0., bias=1.)
   model = FooConv1d(init_cfg)
   model.init_weights()
   # model.conv1d.weight
   # Parameter containing:
   # tensor([[[0., 0., 0., 0.],
   #        [0., 0., 0., 0.],
   #        [0., 0., 0., 0.],
   #        [0., 0., 0., 0.]]], requires_grad=True)

   # Sequential
   init_cfg1 = dict(type='Constant', layer='Conv1d', val=0., bias=1.)
   init_cfg2 = dict(type='Constant', layer='Conv2d', val=2., bias=3.)
   model1 = FooConv1d(init_cfg1)
   model2 = FooConv2d(init_cfg2)
   seq_model = Sequential(model1, model2)
   seq_model.init_weights()
   # seq_model[0].conv1d.weight
   # Parameter containing:
   # tensor([[[0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.]]], requires_grad=True)
   # seq_model[1].conv2d.weight
   # Parameter containing:
   # tensor([[[[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]],
   #         ...,
   #          [[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]]]], requires_grad=True)

   # inner init_cfg has higher priority
   model1 = FooConv1d(init_cfg1)
   model2 = FooConv2d(init_cfg2)
   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
   seq_model = Sequential(model1, model2, init_cfg=init_cfg)
   seq_model.init_weights()
   # seq_model[0].conv1d.weight
   # Parameter containing:
   # tensor([[[0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.]]], requires_grad=True)
   # seq_model[1].conv2d.weight
   # Parameter containing:
   # tensor([[[[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]],
   #         ...,
   #          [[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]]]], requires_grad=True)

   # ModuleList
   model1 = FooConv1d(init_cfg1)
   model2 = FooConv2d(init_cfg2)
   modellist = ModuleList([model1, model2])
   modellist.init_weights()
   # modellist[0].conv1d.weight
   # Parameter containing:
   # tensor([[[0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.]]], requires_grad=True)
   # modellist[1].conv2d.weight
   # Parameter containing:
   # tensor([[[[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]],
   #         ...,
   #          [[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]]]], requires_grad=True)

   # inner init_cfg has higher priority
   model1 = FooConv1d(init_cfg1)
   model2 = FooConv2d(init_cfg2)
   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
   modellist = ModuleList([model1, model2], init_cfg=init_cfg)
   modellist.init_weights()
   # modellist[0].conv1d.weight
   # Parameter containing:
   # tensor([[[0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.]]], requires_grad=True)
   # modellist[1].conv2d.weight
   # Parameter containing:
   # tensor([[[[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]],
   #         ...,
   #          [[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]]]], requires_grad=True)

   # ModuleDict
   model1 = FooConv1d(init_cfg1)
   model2 = FooConv2d(init_cfg2)
   modeldict = ModuleDict(dict(model1=model1, model2=model2))
   modeldict.init_weights()
   # modeldict['model1'].conv1d.weight
   # Parameter containing:
   # tensor([[[0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.]]], requires_grad=True)
   # modeldict['model2'].conv2d.weight
   # Parameter containing:
   # tensor([[[[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]],
   #         ...,
   #          [[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]]]], requires_grad=True)

   # inner init_cfg has higher priority
   model1 = FooConv1d(init_cfg1)
   model2 = FooConv2d(init_cfg2)
   init_cfg = dict(type='Constant', layer=['Conv1d', 'Conv2d'], val=4., bias=5.)
   modeldict = ModuleDict(dict(model1=model1, model2=model2), init_cfg=init_cfg)
   modeldict.init_weights()
   # modeldict['model1'].conv1d.weight
   # Parameter containing:
   # tensor([[[0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.],
   #         [0., 0., 0., 0.]]], requires_grad=True)
   # modeldict['model2'].conv2d.weight
   # Parameter containing:
   # tensor([[[[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]],
   #         ...,
   #          [[2., 2., 2.],
   #           [2., 2., 2.],
   #           [2., 2., 2.]]]], requires_grad=True)
   ```

### Model Zoo

Besides torchvision pre-trained models, we also provide pre-trained models of following CNN:

- VGG Caffe
- ResNet Caffe
- ResNeXt
- ResNet with Group Normalization
- ResNet with Group Normalization and Weight Standardization
- HRNetV2
- Res2Net
- RegNet

#### Model URLs in JSON

The model zoo links in MMCV are managed by JSON files.
The json file consists of key-value pair of model name and its url or path.
An example json file could be like:

```json
{
    "model_a": "https://example.com/models/model_a_9e5bac.pth",
    "model_b": "pretrain/model_b_ab3ef2c.pth"
}
```

The default links of the pre-trained models hosted on OpenMMLab AWS could be found [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/model_zoo/open_mmlab.json).

You may override default links by putting `open-mmlab.json` under `MMCV_HOME`. If `MMCV_HOME` is not find in the environment, `~/.cache/mmcv` will be used by default. You may `export MMCV_HOME=/your/path` to use your own path.

The external json files will be merged into default one. If the same key presents in both external json and default json, the external one will be used.

#### Load Checkpoint

The following types are supported for `filename` argument of `mmcv.load_checkpoint()`.

- filepath: The filepath of the checkpoint.
- `http://xxx` and `https://xxx`: The link to download the checkpoint. The `SHA256` postfix should be contained in the filename.
- `torchvision://xxx`: The model links in `torchvision.models`.Please refer to [torchvision](https://pytorch.org/docs/stable/torchvision/models.html) for details.
- `open-mmlab://xxx`: The model links or filepath provided in default and additional json files.