[Feature]: Add model docs (#442)

* [Feature]: Add model docs * [Fix]: Update doc
2022-08-31 16:06:30 +08:00 · 2022-08-31 16:06:30 +08:00 · 415b032f29
parent 0521abf354
commit 415b032f29
1 changed files with 116 additions and 0 deletions
--- a/docs/en/advanced_guides/models.md
+++ b/docs/en/advanced_guides/models.md
@ -1 +1,117 @@
 # Models
+
+- [Models](#models)
+  - [Overview of modules in MMSelfSup](#overview-of-modules-in-mmselfsup)
+  - [Construct algorithms from sub-modules](#construct-algorithms-from-sub-modules)
+  - [Overview these abstract functions in base model](#overview-these-abstract-functions-in-base-model)
+
+Model can be seen as a feature extractor or loss generator for each algorithm. In MMSelfSup, it mainly
+contains the following fix parts,
+
+- algorithms, containing the full modules of a model and all sub-modules will be
+  constructed in algorithms.
+- backbones, containing the backbones for each algorithm, e.g. ViT for MAE, and Swim Transformer
+  for SimMIM.
+- necks, some specifial modules, such as decoder, appended directly to the output of the backbone.
+- heads, some specifial modules, such as mlp layers, appended to the output of the backbone or neck.
+- memories, some memory banks or queues in some algorithms, e.g. MoCov1/v2.
+- losses, used to compute the loss between the predicted output and the target.
+
+## Overview of modules in MMSelfSup
+
+First, we will give an overview about existing modules in MMSelfSup. They will be displayed according to the categories
+described above.
+
+|       algorithm        |            backbone             |             neck             |                 head                 |                loss                |         memory         |
+| :--------------------: | :-----------------------------: | :--------------------------: | :----------------------------------: | :--------------------------------: | :--------------------: |
+| [`BarlowTwins`](TODO)  |        [`ResNet`](TODO)         |   [`NonLinearNeck`](TODO)    | [`LatentCrossCorrelationHead`](TODO) |   [`CrossCorrelationLoss`](TODO)   |          N/A           |
+|   [`DenseCL`](TODO)    |        [`ResNet`](TODO)         |    [`DenseCLNeck`](TODO)     |      [`ContrastiveHead`](TODO)       |     [`CrossEntropyLoss`](TODO)     |          N/A           |
+|     [`BYOL`](TODO)     |        [`ResNet`](TODO)         |   [`NonLinearNeck`](TODO)    |     [`LatentPredictHead`](TODO)      |   [`CosineSimilarityLoss`](TODO)   |          N/A           |
+|     [`CAE`](TODO)      |        [`CAEViT`](TODO)         |      [`CAENeck`](TODO)       |          [`CAEHead`](TODO)           |         [`CAELoss`](TODO)          |          N/A           |
+| [`DeepCluster`](TODO)  |        [`ResNet`](TODO)         |   [`AvgPool2dNeck`](TODO)    |          [`ClsHead`](TODO)           |     [`CrossEntropyLoss`](TODO)     |          N/A           |
+|     [`MAE`](TODO)      |        [`MAEViT`](TODO)         | [`MAEPretrainDecoder`](TODO) |      [`MAEPretrainHead`](TODO)       |  [`MAEReconstructionLoss`](TODO)   |          N/A           |
+|     [`MoCo`](TODO)     |        [`ResNet`](TODO)         |     [`LinearNeck`](TODO)     |      [`ContrastiveHead`](TODO)       |     [`CrossEntropyLoss`](TODO)     |          N/A           |
+|    [`MoCov3`](TODO)    |       [`MoCoV3ViT`](TODO)       |   [`NonLinearNeck`](TODO)    |         [`MoCoV3Head`](TODO)         |     [`CrossEntropyLoss`](TODO)     |          N/A           |
+|     [`NPID`](TODO)     |        [`ResNet`](TODO)         |     [`LinearNeck`](TODO)     |      [`ContrastiveHead`](TODO)       |     [`CrossEntropyLoss`](TODO)     | [`SimpleMemory`](TODO) |
+|     [`ODC`](TODO)      |        [`ResNet`](TODO)         |      [`ODCNeck`](TODO)       |          [`ClsHead`](TODO)           |     [`CrossEntropyLoss`](TODO)     |  [`ODCMemory`](TODO)   |
+| [`RelativeLoc`](TODO)  |        [`ResNet`](TODO)         |  [`RelativeLocNeck`](TODO)   |          [`ClsHead`](TODO)           |     [`CrossEntropyLoss`](TODO)     |          N/A           |
+| [`RotationPred`](TODO) |        [`ResNet`](TODO)         |             N/A              |          [`ClsHead`](TODO)           |     [`CrossEntropyLoss`](TODO)     |          N/A           |
+|    [`SimCLR`](TODO)    |        [`ResNet`](TODO)         |   [`NonLinearNeck`](TODO)    |      [`ContrastiveHead`](TODO)       |     [`CrossEntropyLoss`](TODO)     |          N/A           |
+|    [`SimMIM`](TODO)    | [`SimMIMSwinTransformer`](TODO) |     [`SimMIMNeck`](TODO)     |         [`SimMIMHead`](TODO)         | [`SimMIMReconstructionLoss`](TODO) |          N/A           |
+|   [`SimSiam`](TODO)    |        [`ResNet`](TODO)         |   [`NonLinearNeck`](TODO)    |     [`LatentPredictHead`](TODO)      |   [`CosineSimilarityLoss`](TODO)   |          N/A           |
+|     [`SwAV`](TODO)     |        [`ResNet`](TODO)         |      [`SwAVNeck`](TODO)      |          [`SwAVHead`](TODO)          |         [`SwAVLoss`](TODO)         |          N/A           |
+
+## Construct algorithms from sub-modules
+
+Just as shown in above table, each algorithm is a combination of backbone, neck, head, loss and memories. You are free to use these existing modules to build your own algorithms. If some customized modules are required, you should follow [add_modules](./add_modules.md) to meet your own need.
+MMSelfSup provides a base model, called `BaseModel`, and all algorithms
+should inherit this base model. And all sub-modules, except for memories, will be built in the base model, during the initialization of each algorithm. Memories will be built in the `__init__` of each specific algorithm. And loss will be built when building the head.
+
+```python
+class BaseModel(_BaseModel):
+
+    def __init__(self,
+                 backbone: dict,
+                 neck: Optional[dict] = None,
+                 head: Optional[dict] = None,
+                 pretrained: Optional[str] = None,
+                 data_preprocessor: Optional[Union[dict, nn.Module]] = None,
+                 init_cfg: Optional[dict] = None):
+
+        if pretrained is not None:
+            init_cfg = dict(type='Pretrained', checkpoint=pretrained)
+
+        if data_preprocessor is None:
+            data_preprocessor = {}
+        # The build process is in MMEngine, so we need to add scope here.
+        data_preprocessor.setdefault('type',
+                                     'mmselfsup.SelfSupDataPreprocessor')
+
+        super().__init__(
+            init_cfg=init_cfg, data_preprocessor=data_preprocessor)
+
+        self.backbone = MODELS.build(backbone)
+
+        if neck is not None:
+            self.neck = MODELS.build(neck)
+
+        if head is not None:
+            self.head = MODELS.build(head)
+
+```
+
+Just as shown above, you should provide the config to build the backbone, but neck and head are optional. In addition to building
+your algorithm, you should overwrite some abstract functions in the base model to get the correct results, which we will discuss in the
+following section.
+
+## Overview these abstract functions in base model
+
+The `forward` function is the entrance to the results. However, it is different from the default `forward` function in most PyTorch code, which
+only has one mode. You will mess all your logic in the `forward` function, limiting the scability. Just as shown in the code below, `forward` function in MMSelfSup has three modes, i) tensor, ii) loss and iii) predict.
+
+```python
+def forward(self,
+                batch_inputs: torch.Tensor,
+                data_samples: Optional[List[SelfSupDataSample]] = None,
+                mode: str = 'tensor'):
+    if mode == 'tensor':
+        feats = self.extract_feat(batch_inputs)
+        return feats
+    elif mode == 'loss':
+        return self.loss(batch_inputs, data_samples)
+    elif mode == 'predict':
+        return self.predict(batch_inputs, data_samples)
+    else:
+        raise RuntimeError(f'Invalid mode "{mode}".')
+```
+
+- tensor, if the mode is `tensor`, the forward function will return the extracted features for images.
+  You should overwrite the `extract_feat` to implement your customized extracting process.
+
+- loss, if the mode is `loss`, the forward function will return the loss between the prediction and the target.
+  You should overview the `loss` to implement your customized loss function.
+
+- predict, if the mode is `predict`, the forward function will return the prediction, e.g. the predicted label, from
+  your algorithm. If should also overwrite the `predict` function.
+
+Now we have introduce the basic components related to models in MMSelfSup, if you want to dive in , please refer the API doc of each algorithm.