[Refactor] Refactor docs directory (#419)

* refactor directory * modify titles * fix lint * update index.rst * update * fix typo * update * fix typo * update model zoo * update index.rst * fix typo * fix typo
2022-08-17 12:06:41 +08:00 · 2022-08-17 12:06:41 +08:00 · d3a487e0b9
parent 83b1a33bed
commit d3a487e0b9
49 changed files with 600 additions and 2060 deletions
--- a/docs/en/advanced_guides/add_datasets.md
+++ b/docs/en/advanced_guides/add_datasets.md
@ -1,8 +1,8 @@
-# Tutorial 1: Adding New Dataset
+# Add Datasets

 In this tutorial, we introduce the basic steps to create your customized dataset:

- [Tutorial 1: Adding New Dataset](#tutorial-1-adding-new-dataset)
+- [Add Datasets](#add-datasets)
  - [An example of customized dataset](#an-example-of-customized-dataset)
  - [Creating the `DataSource`](#creating-the-datasource)
  - [Creating the `Dataset`](#creating-the-dataset)
@ -10,7 +10,7 @@ In this tutorial, we introduce the basic steps to create your customized dataset

 If your algorithm does not need any customized dataset, you can use these off-the-shelf datasets under [datasets](../../mmselfsup/datasets). But to use these existing datasets, you have to convert your dataset to existing dataset format.

-### An example of customized dataset
+## An example of customized dataset

 Assuming the format of your dataset's annotation file is:

@ -24,7 +24,7 @@ To write a new dataset, you need to implement:
 - `DataSource`: inherited from `BaseDataSource` and responsible for loading the annotation files and reading images.
 - `Dataset`: inherited from `BaseDataset` and responsible for applying transformation to images and packing these images.

-### Creating the `DataSource`
+## Creating the `DataSource`

 Assume the name of your `DataSource` is `NewDataSource`, you can create a file, named `new_data_source.py` under `mmselfsup/datasets/data_sources` and implement `NewDataSource` in it.

@ -59,7 +59,7 @@ __all__ = [
 ]
 ```

-### Creating the `Dataset`
+## Creating the `Dataset`

 Assume the name of your `Dataset` is `NewDataset`, you can create a file, named `new_dataset.py` under `mmselfsup/datasets` and implement `NewDataset` in it.

@ -99,7 +99,7 @@ __all__ = [
 ]
 ```

-### Modify config file
+## Modify config file

 To use `NewDataset`, you can modify the config as the following:

--- a/docs/en/advanced_guides/add_modules.md
+++ b/docs/en/advanced_guides/add_modules.md
@ -1,6 +1,6 @@
-# Tutorial 3: Adding New Modules
+# Add Modules

- [Tutorial 3: Adding New Modules](#tutorial-3-adding-new-modules)
+- [Add Modules](#add-modules)
  - [Add new backbone](#add-new-backbone)
  - [Add new necks](#add-new-necks)
  - [Add new loss](#add-new-loss)
--- a/docs/en/advanced_guides/add_transforms.md
+++ b/docs/en/advanced_guides/add_transforms.md
@ -1,6 +1,6 @@
-# Tutorial 2: Customize Data Pipelines
+# Add Transforms

- [Tutorial 2: Customize Data Pipelines](#tutorial-2-customize-data-pipelines)
+- [Add Transforms](#add-transforms)
  - [Overview of `Pipeline`](#overview-of-pipeline)
  - [Creating new augmentations in `Pipeline`](#creating-new-augmentations-in-pipeline)

--- a/docs/en/advanced_guides/conventions.md
+++ b/docs/en/advanced_guides/conventions.md
@ -0,0 +1 @@
+# Conventions
--- a/docs/en/advanced_guides/customize_runtime.md
+++ b/docs/en/advanced_guides/customize_runtime.md
@ -1,6 +1,6 @@
-# Tutorial 5: Customize Runtime Settings
+# Customize Runtime

- [Tutorial 5: Customize Runtime Settings](#tutorial-5-customize-runtime-settings)
+- [Customize Runtime](#customize-runtime)
  - [Customize Workflow](#customize-workflow)
  - [Hooks](#hooks)
    - [default training hooks](#default-training-hooks)
--- a/docs/en/advanced_guides/data_flow.md
+++ b/docs/en/advanced_guides/data_flow.md
@ -0,0 +1 @@
+# Data Flow
--- a/docs/en/advanced_guides/datasets.md
+++ b/docs/en/advanced_guides/datasets.md
@ -0,0 +1 @@
+# Datasets
--- a/docs/en/advanced_guides/engine.md
+++ b/docs/en/advanced_guides/engine.md
@ -0,0 +1 @@
+# Engine
--- a/docs/en/advanced_guides/evaluation.md
+++ b/docs/en/advanced_guides/evaluation.md
@ -0,0 +1 @@
+# Evaluation
--- a/docs/en/advanced_guides/index.rst
+++ b/docs/en/advanced_guides/index.rst
@ -0,0 +1,25 @@
+Basic Concepts
+***************
+
+.. toctree::
+   :maxdepth: 1
+
+   data_flow.md
+   structures.md
+   models.md
+   datasets.md
+   transforms.md
+   evaluation.md
+   engine.md
+   conventions.md
+
+Component Customization
+************************
+
+.. toctree::
+   :maxdepth: 1
+
+   add_modules.md
+   add_datasets.md
+   add_transforms.md
+   customize_runtime.md
--- a/docs/en/advanced_guides/models.md
+++ b/docs/en/advanced_guides/models.md
@ -0,0 +1 @@
+# Models
--- a/docs/en/advanced_guides/structures.md
+++ b/docs/en/advanced_guides/structures.md
@ -0,0 +1 @@
+# Structures
--- a/docs/en/advanced_guides/transforms.md
+++ b/docs/en/advanced_guides/transforms.md
@ -0,0 +1 @@
+# Transforms
--- a/docs/en/algorithms/barlowtwins.md
+++ b/docs/en/algorithms/barlowtwins.md
@ -1,52 +0,0 @@
-# BarlowTwins
-
-> [Barlow Twins: Self-Supervised Learning via Redundancy Reduction](https://arxiv.org/abs/2103.03230)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We propose an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible. This causes the embedding vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors. The method is called Barlow Twins, owing to neuroscientist H. Barlow's redundancy-reduction principle applied to a pair of identical networks. Barlow Twins does not require large batches nor asymmetry between the network twins such as a predictor network, gradient stopping, or a moving average on the weight updates. Intriguingly it benefits from very high-dimensional output vectors. Barlow Twins outperforms previous methods on ImageNet for semi-supervised classification in the low-data regime, and is on par with current state of the art for ImageNet classification with a linear classifier head, and for transfer tasks of classification and object detection.
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/36138628/163914714-082de804-0b5f-4024-94f9-880e6ef334fa.png" width="800" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 1 downstream task datasets, **ImageNet**. If not specified, the results are Top-1 (%).
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-90e.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_8xb32-steplr-100e_in1k.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                       | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [barlowtwins_resnet50_8xb256-coslr-300e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/arlowtwins/barlowtwins_resnet50_8xb256-coslr-300e_in1k.py) | 15.51    | 33.98    | 45.96    | 61.90    | 71.01    | 71.66   |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                                                       | k=10 | k=20 | k=100 | k=200 |
-| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ---- | ----- | ----- |
-| [barlowtwins_resnet50_8xb256-coslr-300e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/arlowtwins/barlowtwins_resnet50_8xb256-coslr-300e_in1k.py) | 63.6 | 63.8 | 62.7  | 61.9  |
-
-## Citation
-
-```bibtex
-@inproceedings{zbontar2021barlow,
-  title={Barlow twins: Self-supervised learning via redundancy reduction},
-  author={Zbontar, Jure and Jing, Li and Misra, Ishan and LeCun, Yann and Deny, St{\'e}phane},
-  booktitle={International Conference on Machine Learning},
-  year={2021},
-}
-```
--- a/docs/en/algorithms/byol.md
+++ b/docs/en/algorithms/byol.md
@ -1,105 +0,0 @@
-# BYOL
-
-> [Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-**B**ootstrap **Y**our **O**wn **L**atent (BYOL) is a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network.
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/36138628/149720208-5ffbee78-1437-44c7-9ddb-b8caab60d2c3.png" width="800" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                                       | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb32-accum16-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py) | feature5   | 86.31 | 45.37 | 56.83 | 68.47 | 74.12 | 78.30 | 81.53 | 83.56 | 84.73 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb512-coslr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb512-coslr-90e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb32-accum16-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py) | 15.16    | 35.26    | 47.77    | 63.10    | 71.21    | 71.72   |
-| [resnet50_16xb256-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_16xb256-coslr-200e_in1k.py)             | 15.41    | 35.15    | 47.77    | 62.59    | 71.85    | 71.88   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb32-accum16-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py) | 21.25    | 36.55    | 43.66    | 50.74    | 53.82    |
-| [resnet50_8xb32-accum16-coslr-300e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-300e_in1k.py) | 21.18    | 36.68    | 43.42    | 51.04    | 54.06    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                                       | k=10 | k=20 | k=100 | k=200 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---- | ---- | ----- | ----- |
-| [resnet50_8xb32-accum16-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py) | 63.9 | 64.2 | 62.9  | 61.9  |
-| [resnet50_8xb32-accum16-coslr-300e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-300e_in1k.py) | 66.1 | 66.3 | 65.2  | 64.4  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | AP50  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb32-accum16-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py) | 80.35 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb32-accum16-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py) | 40.9     | 61.0      | 44.6      | 36.8      | 58.1       | 39.5       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | mIOU  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb32-accum16-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py) | 67.16 |
-
-## Citation
-
-```bibtex
-@inproceedings{grill2020bootstrap,
-  title={Bootstrap your own latent: A new approach to self-supervised learning},
-  author={Grill, Jean-Bastien and Strub, Florian and Altch{\'e}, Florent and Tallec, Corentin and Richemond, Pierre H and Buchatskaya, Elena and Doersch, Carl and Pires, Bernardo Avila and Guo, Zhaohan Daniel and Azar, Mohammad Gheshlaghi and others},
-  booktitle={NeurIPS},
-  year={2020}
-}
-```
--- a/docs/en/algorithms/cae.md
+++ b/docs/en/algorithms/cae.md
@ -1,39 +0,0 @@
-# CAE
-
-> [Context Autoencoder for Self-Supervised Representation Learning](https://arxiv.org/abs/2202.03026)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised learning. We randomly partition the image into two sets: visible patches and masked patches. The CAE architecture consists of: (i) an encoder that takes visible patches as input and outputs their latent representations, (ii) a latent context regressor that predicts the masked patch representations from the visible patch representations that are not updated in this regressor, (iii) a decoder that takes the estimated masked patch representations as input and makes predictions for the masked patches, and (iv) an alignment module that aligns the masked patch representation estimation with the masked patch representations computed from the encoder. In comparison to previous MIM methods that couple the encoding and decoding roles, e.g., using a single module in BEiT, our approach attempts to separate the encoding role (content understanding) from the decoding role (making predictions for masked patches) using different modules, improving the content understanding capability. In addition, our approach makes predictions from the visible patches to the masked patches in the latent representation space that is expected to take on semantics. In addition, we present the explanations about why contrastive pretraining and supervised pretraining perform similarly and why MIM potentially performs better. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks: semantic segmentation, and object detection and instance segmentation.
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/30762564/165459947-6c6ef13c-0593-4765-b44e-6da0a079802a.png" width="40%"/>
-</div>
-
-## Prerequisite
-
-Create a new folder `cae_ckpt` under the root directory and download the
-[weights](https://download.openmmlab.com/mmselfsup/cae/dalle_encoder.pth) for `dalle` encoder to that folder
-
-## Models and Benchmarks
-
-Here, we report the results of the model, which is pre-trained on ImageNet-1k
-for 300 epochs, the details are below:
-
-| Backbone | Pre-train epoch | Fine-tuning Top-1 |                                                         Pre-train Config                                                          |                                                                   Fine-tuning Config                                                                   |                                                                                                                        Download                                                                                                                         |
-| :------: | :-------------: | :---------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| ViT-B/16 |       300       |       83.2        | [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/cae/cae_vit-base-p16_8xb256-fp16-coslr-300e_in1k.py) | [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/vit-base-p16_ft-8xb128-coslr-100e-rpe_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/cae/cae_vit-base-p16_16xb256-coslr-300e_in1k-224_20220427-4c786349.pth) \| [log](https://download.openmmlab.com/mmselfsup/cae/cae_vit-base-p16_16xb256-coslr-300e_in1k-224_20220427-4c786349.log.json) |
-
-## Citation
-
-```bibtex
-@article{CAE,
-  title={Context Autoencoder for Self-Supervised Representation Learning},
-  author={Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo,
-  Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang},
-  journal={ArXiv},
-  year={2022}
-}
-```
--- a/docs/en/algorithms/deep.md
+++ b/docs/en/algorithms/deep.md
@ -1,62 +0,0 @@
-# DeepCluster
-
-> [Deep Clustering for Unsupervised Learning of Visual Features](https://arxiv.org/abs/1807.05520)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update the weights of the network.
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/36138628/149720586-5bfd213e-0638-47fc-b48a-a16689190e17.png" width="700" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                                                   | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [sobel_resnet50_8xb64-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/deepcluster/deepcluster-sobel_resnet50_8xb64-steplr-200e_in1k.py) | feature5   | 74.26 | 29.37 | 37.99 | 45.85 | 55.57 | 62.48 | 66.15 | 70.00 | 71.37 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-steplr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                   | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- | ------- |
-| [sobel_resnet50_8xb64-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/deepcluster/deepcluster-sobel_resnet50_8xb64-steplr-200e_in1k.py) | 12.78    | 30.81    | 43.88    | 57.71    | 51.68    | 46.92   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                   | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- |
-| [sobel_resnet50_8xb64-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/deepcluster/deepcluster-sobel_resnet50_8xb64-steplr-200e_in1k.py) | 18.80    | 33.93    | 41.44    | 47.22    | 42.61    |
-
-## Citation
-
-```bibtex
-@inproceedings{caron2018deep,
-  title={Deep clustering for unsupervised learning of visual features},
-  author={Caron, Mathilde and Bojanowski, Piotr and Joulin, Armand and Douze, Matthijs},
-  booktitle={ECCV},
-  year={2018}
-}
-```
--- a/docs/en/algorithms/dense.md
+++ b/docs/en/algorithms/dense.md
@ -1,102 +0,0 @@
-# DenseCL
-
-> [Dense Contrastive Learning for Self-Supervised Visual Pre-Training](https://arxiv.org/abs/2011.09157)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning (DenseCL), which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/36138628/149721111-bab03a6d-a30d-418e-b338-43c3689cfc65.png" width="900" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are  Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                             | Best Layer | SVM  | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ---- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py) | feature5   | 82.5 | 42.68 | 50.64 | 61.74 | 68.17 | 72.99 | 76.07 | 79.19 | 80.55 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-steplr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py) | 15.86    | 35.47    | 49.46    | 64.06    | 62.95    | 63.34   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py) | 21.32    | 36.20    | 43.97    | 51.04    | 50.45    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                             | k=10 | k=20 | k=100 | k=200 |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ---- | ----- | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py) | 48.2 | 48.5 | 46.8  | 45.6  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | AP50  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py) | 82.14 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py) |          |           |           |           |            |            |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | mIOU  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py) | 69.47 |
-
-## Citation
-
-```bibtex
-@inproceedings{wang2021dense,
-  title={Dense contrastive learning for self-supervised visual pre-training},
-  author={Wang, Xinlong and Zhang, Rufeng and Shen, Chunhua and Kong, Tao and Li, Lei},
-  booktitle={CVPR},
-  year={2021}
-}
-```
--- a/docs/en/algorithms/mae.md
+++ b/docs/en/algorithms/mae.md
@ -1,49 +0,0 @@
-# MAE
-
-> [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-This paper shows that masked autoencoders (MAE) are
-scalable self-supervised learners for computer vision. Our
-MAE approach is simple: we mask random patches of the
-input image and reconstruct the missing pixels. It is based
-on two core designs. First, we develop an asymmetric
-encoder-decoder architecture, with an encoder that operates only on the
-visible subset of patches (without mask tokens), along with a lightweight
-decoder that reconstructs the original image from the latent representation
-and mask tokens. Second, we find that masking a high proportion
-of the input image, e.g., 75%, yields a nontrivial and
-meaningful self-supervisory task. Coupling these two designs enables us to
-train large models efficiently and effectively: we accelerate
-training (by 3× or more) and improve accuracy. Our scalable approach allows
-for learning high-capacity models that generalize well: e.g., a vanilla
-ViT-Huge model achieves the best accuracy (87.8%) among
-methods that use only ImageNet-1K data. Transfer performance in downstream tasks outperforms supervised pretraining and shows promising scaling behavior.
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/30762564/150733959-2959852a-c7bd-4d3f-911f-3e8d8839fe67.png" width="40%"/>
-</div>
-
-## Models and Benchmarks
-
-Here, we report the results of the model, which is pre-trained on ImageNet1K
-for 400 epochs, the details are below:
-
-| Backbone | Pre-train epoch | Fine-tuning Top-1 |                                                     Pre-train Config                                                      |                                                               Fine-tuning Config                                                                |                                                                                                                      Download                                                                                                                       |
-| :------: | :-------------: | :---------------: | :-----------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| ViT-B/16 |       400       |       83.1        | [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mae/mae_vit-b-p16_8xb512-coslr-400e_in1k.py) | [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/vit-b-p16_ft-8xb128-coslr-100e_in1k.py) | [model](https://download.openmmlab.com/mmselfsup/mae/mae_vit-base-p16_8xb512-coslr-400e_in1k-224_20220223-85be947b.pth) \| [log](https://download.openmmlab.com/mmselfsup/mae/mae_vit-base-p16_8xb512-coslr-300e_in1k-224_20220210_140925.log.json) |
-
-## Citation
-
-```bibtex
-@article{He2021MaskedAA,
-  title={Masked Autoencoders Are Scalable Vision Learners},
-  author={Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and
-  Piotr Doll'ar and Ross B. Girshick},
-  journal={ArXiv},
-  year={2021}
-}
-```
--- a/docs/en/algorithms/moco.md
+++ b/docs/en/algorithms/moco.md
@ -1,102 +0,0 @@
-# MoCo v2
-
-> [Improved Baselines with Momentum Contrastive Learning](https://arxiv.org/abs/2003.04297)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. With simple modifications to MoCo—namely, using an MLP projection head and more data augmentation—we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149720067-b65e5736-d425-45b3-93ed-6f2427fc6217.png" width="500" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are  Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                           | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py) | feature5   | 84.04 | 43.14 | 53.29 | 65.34 | 71.03 | 75.42 | 78.48 | 80.88 | 82.23 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-steplr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py) | 15.96    | 34.22    | 45.78    | 61.11    | 66.24    | 67.58   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py) | 20.92    | 35.72    | 42.62    | 49.79    | 52.25    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                           | k=10 | k=20 | k=100 | k=200 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ---- | ---- | ----- | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py) | 55.6 | 55.7 | 53.7  | 52.5  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | AP50  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py) | 81.06 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py) | 40.2     | 59.7      | 44.2      | 36.1      | 56.7       | 38.8       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | mIOU  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py) | 67.55 |
-
-## Citation
-
-```bibtex
-@article{chen2020improved,
-  title={Improved baselines with momentum contrastive learning},
-  author={Chen, Xinlei and Fan, Haoqi and Girshick, Ross and He, Kaiming},
-  journal={arXiv preprint arXiv:2003.04297},
-  year={2020}
-}
-```
--- a/docs/en/algorithms/mocov3.md
+++ b/docs/en/algorithms/mocov3.md
@ -1,42 +0,0 @@
-# MoCo v3
-
-> [An Empirical Study of Training Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.02057)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-This paper does not describe a novel method. Instead, it studies a straightforward, incremental, yet must-know baseline given the recent progress in computer vision: self-supervised learning for Vision Transformers (ViT). While the training recipes for standard convolutional networks have been highly mature and robust, the recipes for ViT are yet to be built, especially in the self-supervised scenarios where training becomes more challenging. In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT. We observe that instability is a major issue that degrades accuracy, and it can be hidden by apparently good results. We reveal that these results are indeed partial failure, and they can be improved when training is made more stable. We benchmark ViT results in MoCo v3 and several other self-supervised frameworks, with ablations in various aspects. We discuss the currently positive evidence as well as challenges and open questions. We hope that this work will provide useful data points and experience for future research.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/151305362-e6e8ea35-b3b8-45f6-8819-634e67083218.png" width="500" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### ImageNet Linear Evaluation
-
-The **Linear Evaluation** result is obtained by training a linear head upon the pre-trained backbone. Please refer to [vit-small-p16_8xb128-coslr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/vit-small-p16_8xb128-coslr-90e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                                              | Linear Evaluation |
-| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- |
-| [vit-small-p16_linear-32xb128-fp16-coslr-300e_in1k-224](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov3/mocov3_vit-small-p16_linear-32xb128-fp16-coslr-300e_in1k-224.py) | 73.19             |
-
-## Citation
-
-```bibtex
-@InProceedings{Chen_2021_ICCV,
-    title     = {An Empirical Study of Training Self-Supervised Vision Transformers},
-    author    = {Chen, Xinlei and Xie, Saining and He, Kaiming},
-    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
-    year      = {2021}
-}
-```
--- a/docs/en/algorithms/npid.md
+++ b/docs/en/algorithms/npid.md
@ -1,106 +0,0 @@
-# NPID
-
-> [Unsupervised Feature Learning via Non-Parametric Instance Discrimination](https://arxiv.org/abs/1805.01978)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether this observation can be extended beyond the conventional domain of supervised learning: Can we learn a good feature representation that captures apparent similar- ity among instances, instead of classes, by merely asking the feature to be discriminative of individual instances?
-
-We formulate this intuition as a non-parametric classification problem at the instance-level, and use noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes. Our experimental results demonstrate that, under unsupervised learning settings, our method surpasses the state-of-the-art on ImageNet classification by a large margin.
-
-Our method is also remarkable for consistently improving test performance with more training data and better network architectures. By fine-tuning the learned feature, we further obtain competitive results for semi-supervised learning and object detection tasks. Our non-parametric model is highly compact: With 128 features per image, our method requires only 600MB storage for a million images, enabling fast nearest neighbour retrieval at the run time.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149722257-1651c283-ac68-4cdc-90e6-970d820529af.png" width="800" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are  Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                         | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| ---------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb32-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py) | feature5   | 76.75 | 26.96 | 35.37 | 44.48 | 53.89 | 60.39 | 66.41 | 71.48 | 73.39 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-steplr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                         | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ---------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb32-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py) | 14.68    | 31.98    | 42.85    | 56.95    | 58.41    | 57.97   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                         | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| ---------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb32-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py) | 19.98    | 34.86    | 41.59    | 48.43    | 48.71    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                         | k=10 | k=20 | k=100 | k=200 |
-| ---------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ---- | ----- | ----- |
-| [resnet50_8xb32-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py) | 42.9 | 44.0 | 43.2  | 42.2  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                         | AP50  |
-| ---------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py) | 79.52 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                         | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| ---------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb32-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py) | 38.5     | 57.7      | 42.0      | 34.6      | 54.8       | 37.1       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                         | mIOU  |
-| ---------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-steplr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py) | 65.45 |
-
-## Citation
-
-```bibtex
-@inproceedings{wu2018unsupervised,
-  title={Unsupervised feature learning via non-parametric instance discrimination},
-  author={Wu, Zhirong and Xiong, Yuanjun and Yu, Stella X and Lin, Dahua},
-  booktitle={CVPR},
-  year={2018}
-}
-```
--- a/docs/en/algorithms/odc.md
+++ b/docs/en/algorithms/odc.md
@ -1,70 +0,0 @@
-# ODC
-
-> [Online Deep Clustering for Unsupervised Representation Learning](https://arxiv.org/abs/2006.10645)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Joint clustering and feature learning methods have shown remarkable performance in unsupervised representation learning. However, the training schedule alternating between feature clustering and network parameters update leads to unstable learning of visual representations. To overcome this challenge, we propose Online Deep Clustering (ODC) that performs clustering and network update simultaneously rather than alternatingly. Our key insight is that the cluster centroids should evolve steadily in keeping the classifier stably updated. Specifically, we design and maintain two dynamic memory modules, i.e., samples memory to store samples’ labels and features, and centroids memory for centroids evolution. We break down the abrupt global clustering into steady memory update and batch-wise label re-assignment. The process is integrated into network update iterations. In this way, labels and the network evolve shoulder-to-shoulder rather than alternatingly. Extensive experiments demonstrate that ODC stabilizes the training process and boosts the performance effectively.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149722645-8da8e5b2-8846-4554-aa3e-727d286b85cd.png" width="700" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                       | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| -------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb64-steplr-440e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py) | feature5   | 78.42 | 32.42 | 40.27 | 49.95 | 59.96 | 65.71 | 69.99 | 73.64 | 75.13 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-steplr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                       | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| -------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb64-steplr-440e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py) | 14.76    | 31.82    | 42.44    | 55.76    | 57.70    | 53.42   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                       | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| -------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb64-steplr-440e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py) | 19.28    | 34.09    | 40.90    | 47.04    | 48.35    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                       | k=10 | k=20 | k=100 | k=200 |
-| -------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ---- | ----- | ----- |
-| [resnet50_8xb64-steplr-440e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py) | 38.5 | 39.1 | 37.8  | 36.9  |
-
-## Citation
-
-```bibtex
-@inproceedings{zhan2020online,
-  title={Online deep clustering for unsupervised representation learning},
-  author={Zhan, Xiaohang and Xie, Jiahao and Liu, Ziwei and Ong, Yew-Soon and Loy, Chen Change},
-  booktitle={CVPR},
-  year={2020}
-}
-```
--- a/docs/en/algorithms/rl.md
+++ b/docs/en/algorithms/rl.md
@ -1,102 +0,0 @@
-# Relative Location
-
-> [Unsupervised Visual Representation Learning by Context Prediction](https://arxiv.org/abs/1505.05192)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the RCNN framework and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-the-art performance among algorithms which use only Pascal-provided training set annotations.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149723222-76bc89e8-98bf-4ed7-b179-dfe5bc6336ba.png" width="400" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                                       | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb64-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py) | feature4   | 65.52 | 20.36 | 23.12 | 30.66 | 37.02 | 42.55 | 50.00 | 55.58 | 59.28 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-steplr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb64-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py) | 15.11    | 30.47    | 42.83    | 51.20    | 40.96    | 39.65   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb64-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py) | 20.69    | 34.72    | 43.01    | 45.97    | 41.96    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                                       | k=10 | k=20 | k=100 | k=200 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---- | ---- | ----- | ----- |
-| [resnet50_8xb64-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py) | 14.5 | 15.0 | 15.0  | 14.2  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | AP50  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb64-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py) | 79.70 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb64-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py) | 37.5     | 56.2      | 41.3      | 33.7      | 53.3       | 36.1       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                       | mIOU  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb64-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py) | 63.49 |
-
-## Citation
-
-```bibtex
-@inproceedings{doersch2015unsupervised,
-  title={Unsupervised visual representation learning by context prediction},
-  author={Doersch, Carl and Gupta, Abhinav and Efros, Alexei A},
-  booktitle={ICCV},
-  year={2015}
-}
-```
--- a/docs/en/algorithms/rp.md
+++ b/docs/en/algorithms/rp.md
@ -1,102 +0,0 @@
-# Rotation Prediction
-
-> [Unsupervised Representation Learning by Predicting Image Rotation](https://arxiv.org/abs/1803.07728)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Over the last years, deep convolutional neural networks (ConvNets) have transformed the field of computer vision thanks to their unparalleled capacity to learn high level semantic image features. However, in order to successfully learn those features, they usually require massive amounts of manually labeled data, which is both expensive and impractical to scale. Therefore, unsupervised semantic feature learning, i.e., learning without requiring manual annotation effort, is of crucial importance in order to successfully harvest the vast amount of visual data that are available today. In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. We demonstrate both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning. We exhaustively evaluate our method in various unsupervised feature learning benchmarks and we exhibit in all of them state-of-the-art performance. Specifically, our results on those benchmarks demonstrate dramatic improvements w.r.t. prior state-of-the-art approaches in unsupervised representation learning and thus significantly close the gap with supervised feature learning.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149723477-8f63e237-362e-4962-b405-9bab0f579808.png" width="700" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                                         | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb16-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py) | feature4   | 67.70 | 20.60 | 24.35 | 31.41 | 39.17 | 46.56 | 53.37 | 59.14 | 62.42 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-steplr-100e_in1k.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-steplr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                         | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb16-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py) | 12.15    | 31.99    | 44.57    | 54.20    | 45.94    | 48.12   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                         | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb16-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py) | 18.94    | 34.72    | 44.53    | 46.30    | 44.12    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                                         | k=10 | k=20 | k=100 | k=200 |
-| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ---- | ----- | ----- |
-| [resnet50_8xb16-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py) | 11.0 | 11.9 | 12.6  | 12.4  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                         | AP50  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb16-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py) | 79.67 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                         | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb16-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py) | 37.9     | 56.5      | 41.5      | 34.2      | 53.9       | 36.7       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                         | mIOU  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb16-steplr-70e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py) | 64.31 |
-
-## Citation
-
-```bibtex
-@inproceedings{komodakis2018unsupervised,
-  title={Unsupervised representation learning by predicting image rotations},
-  author={Komodakis, Nikos and Gidaris, Spyros},
-  booktitle={ICLR},
-  year={2018}
-}
-```
--- a/docs/en/algorithms/simclr.md
+++ b/docs/en/algorithms/simclr.md
@ -1,103 +0,0 @@
-# SimCLR
-
-> [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/abs/2002.05709)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149723851-cf5f309e-d891-454d-90c0-e5337e5a11ed.png" width="400" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                           | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ---- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py) | feature5   | 79.98 | 35.02 | 42.79 | 54.87 | 61.91 | 67.38 | 71.88 | 75.56 | 77.4 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb512-coslr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb512-coslr-90e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                               | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py)     | 16.29    | 31.11    | 39.99    | 55.06    | 62.91    | 62.56   |
-| [resnet50_16xb256-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_16xb256-coslr-200e_in1k.py) | 15.44    | 31.47    | 41.83    | 59.44    | 66.41    | 66.66   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py) | 20.60    | 33.62    | 38.86    | 45.25    | 50.91    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                           | k=10 | k=20 | k=100 | k=200 |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ---- | ---- | ----- | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py) | 47.8 | 48.4 | 46.7  | 45.2  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | AP50  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py) | 79.38 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py) | 38.7     | 58.1      | 42.4      | 34.9      | 55.3       | 37.5       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                           | mIOU  |
-| ------------------------------------------------------------------------------------------------------------------------------------------------ | ----- |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py) | 64.03 |
-
-## Citation
-
-```bibtex
-@inproceedings{chen2020simple,
-  title={A simple framework for contrastive learning of visual representations},
-  author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey},
-  booktitle={ICML},
-  year={2020},
-}
-```
--- a/docs/en/algorithms/simmim.md
+++ b/docs/en/algorithms/simmim.md
@ -1,32 +0,0 @@
-# SimMIM
-
-> [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-This paper presents SimMIM, a simple framework for masked image modeling. We simplify recently proposed related approaches without special designs such as blockwise masking and tokenization via discrete VAE or clustering. To study what let the masked image modeling task learn good representations, we systematically study the major components in our framework, and find that simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a strong pre-text task; 2) predicting raw pixels of RGB values by direct regression performs no worse than the patch classification approaches with complex designs; 3) the prediction head can be as light as a linear layer, with no worse performance than heavier ones. Using ViT-B, our approach achieves 83.8% top-1 fine-tuning accuracy on ImageNet-1K by pre-training also on this dataset, surpassing previous best approach by +0.6%. When applied on a larger model of about 650 million parameters, SwinV2H, it achieves 87.1% top-1 accuracy on ImageNet-1K using only ImageNet-1K data. We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by 40× less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks. The code and models will be publicly available at https: //github.com/microsoft/SimMIM .
-
-<div align="center">
-<img src="https://user-images.githubusercontent.com/30762564/159404597-ac6d3a44-ee59-4cdc-8f6f-506a7d1b18b6.png" width="40%"/>
-</div>
-
-## Models and Benchmarks
-
-Here, we report the results of the model, and more results will be coming soon.
-
-| Backbone  | Pre-train epoch | Fine-tuning Top-1 |                                                         Pre-train Config                                                          |                                                              Fine-tuning Config                                                              |                                                                                                                           Download                                                                                                                            |
-| :-------: | :-------------: | :---------------: | :-------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| Swin-Base |       100       |       82.9        | [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simmim/simmim_swin-base_16xb128-coslr-100e_in1k-192) | [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/swin-base_ft-8xb256-coslr-100e_in1k) | [model](https://download.openmmlab.com/mmselfsup/simmim/simmim_swin-base_16xb128-coslr-100e_in1k-192_20220316-1d090125.pth) \| [log](https://download.openmmlab.com/mmselfsup/simmim/simmim_swin-base_16xb128-coslr-100e_in1k-192_20220316-1d090125.log.json) |
-
-## Citation
-
-```bibtex
-@inproceedings{xie2021simmim,
-  title={SimMIM: A Simple Framework for Masked Image Modeling},
-  author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},
-  booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
-  year={2022}
-}
-```
--- a/docs/en/algorithms/ss.md
+++ b/docs/en/algorithms/ss.md
@ -1,109 +0,0 @@
-# SimSiam
-
-> [Exploring Simple Siamese Representation Learning](https://arxiv.org/abs/2011.10566)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our “SimSiam” method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149724180-bc7bac6a-fcb8-421e-b8f1-9550c624d154.png" width="500" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                             | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb32-coslr-100e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py) | feature5   | 84.64 | 39.65 | 49.86 | 62.48 | 69.50 | 74.48 | 78.31 | 81.06 | 82.56 |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py) | feature5   | 85.20 | 39.85 | 50.44 | 63.73 | 70.93 | 75.74 | 79.42 | 82.02 | 83.44 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb512-coslr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb512-coslr-90e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb32-coslr-100e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py) | 16.27    | 33.77    | 45.80    | 60.83    | 68.21    | 68.28   |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py) | 15.57    | 37.21    | 47.28    | 62.21    | 69.85    | 69.84   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb32-coslr-100e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py) | 21.32    | 35.66    | 43.05    | 50.79    | 53.27    |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py) | 21.17    | 35.85    | 43.49    | 50.99    | 54.10    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                             | k=10 | k=20 | k=100 | k=200 |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ---- | ----- | ----- |
-| [resnet50_8xb32-coslr-100e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py) | 57.4 | 57.6 | 55.8  | 54.2  |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py) | 60.2 | 60.4 | 58.8  | 57.4  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | AP50  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-coslr-100e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py) | 79.80 |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py) | 79.85 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb32-coslr-100e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py) | 38.6     | 57.6      | 42.3      | 34.6      | 54.8       | 36.9       |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py) | 38.8     | 58.0      | 42.3      | 34.9      | 55.3       | 37.6       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                             | mIOU  |
-| -------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-coslr-100e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py) | 48.35 |
-| [resnet50_8xb32-coslr-200e](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py) | 46.27 |
-
-## Citation
-
-```bibtex
-@inproceedings{chen2021exploring,
-  title={Exploring simple siamese representation learning},
-  author={Chen, Xinlei and He, Kaiming},
-  booktitle={CVPR},
-  year={2021}
-}
-```
--- a/docs/en/algorithms/swav.md
+++ b/docs/en/algorithms/swav.md
@ -1,102 +0,0 @@
-# SwAV
-
-> [Unsupervised Learning of Visual Features by Contrasting Cluster Assignments](https://arxiv.org/abs/2006.09882)
-
-<!-- [ALGORITHM] -->
-
-## Abstract
-
-Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a “swapped” prediction mechanism where we predict the code of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements.
-
-<div align="center">
-<img  src="https://user-images.githubusercontent.com/36138628/149724517-9f1e7bdf-04c7-43e3-92f4-2b8fc1399123.png" width="500" />
-</div>
-
-## Results and Models
-
-**Back to [model_zoo.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/model_zoo.md) to download models.**
-
-In this page, we provide benchmarks as much as possible to evaluate our pre-trained models. If not mentioned, all models are pre-trained on ImageNet-1k dataset.
-
-### Classification
-
-The classification benchmarks includes 4 downstream task datasets, **VOC**, **ImageNet**,  **iNaturalist2018** and **Places205**. If not specified, the results are Top-1 (%).
-
-#### VOC SVM / Low-shot SVM
-
-The **Best Layer** indicates that the best results are obtained from which layers feature map. For example, if the **Best Layer** is **feature3**, its best result is obtained from the second stage of ResNet (1 for stem layer, 2-5 for 4 stage layers).
-
-Besides, k=1 to 96 indicates the hyper-parameter of Low-shot SVM.
-
-| Self-Supervised Config                                                                                                                                                              | Best Layer | SVM   | k=1   | k=2   | k=4   | k=8   | k=16  | k=32  | k=64  | k=96  |
-| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
-| [resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | feature5   | 87.00 | 44.68 | 55.41 | 67.64 | 73.67 | 78.14 | 81.58 | 83.98 | 85.15 |
-
-#### ImageNet Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_linear-8xb32-steplr-90e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_mhead_linear-8xb32-steplr-90e_in1k.py) for details of config.
-
-The **AvgPool** result is obtained from Linear Evaluation with GlobalAveragePooling. Please refer to [resnet50_linear-8xb32-coslr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/imagenet/resnet50_linear-8xb32-coslr-100e_in1k.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                              | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 | AvgPool |
-| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- | ------- |
-| [resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 16.98    | 34.96    | 49.26    | 65.98    | 70.74    | 70.47   |
-
-#### Places205 Linear Evaluation
-
-The **Feature1 - Feature5** don't have the GlobalAveragePooling, the feature map is pooled to the specific dimensions and then follows a Linear layer to do the classification. Please refer to [resnet50_mhead_8xb32-steplr-28e_places205.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/classification/places205/resnet50_mhead_8xb32-steplr-28e_places205.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                              | Feature1 | Feature2 | Feature3 | Feature4 | Feature5 |
-| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------- | -------- | -------- | -------- |
-| [resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 23.33    | 35.45    | 43.13    | 51.98    | 55.09    |
-
-#### ImageNet Nearest-Neighbor Classification
-
-The results are obtained from the features after GlobalAveragePooling. Here, k=10 to 200 indicates different number of nearest neighbors.
-
-| Self-Supervised Config                                                                                                                                                              | k=10 | k=20 | k=100 | k=200 |
-| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---- | ---- | ----- | ----- |
-| [resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 60.5 | 60.6 | 59.0  | 57.6  |
-
-### Detection
-
-The detection benchmarks includes 2 downstream task datasets, **Pascal VOC 2007 + 2012** and **COCO2017**. This benchmark follows the evluation protocols set up by MoCo.
-
-#### Pascal VOC 2007 + 2012
-
-Please refer to [faster_rcnn_r50_c4_mstrain_24k_voc0712.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                              | AP50  |
-| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 77.64 |
-
-#### COCO2017
-
-Please refer to [mask_rcnn_r50_fpn_mstrain_1x_coco.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                              | mAP(Box) | AP50(Box) | AP75(Box) | mAP(Mask) | AP50(Mask) | AP75(Mask) |
-| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------- | --------- | --------- | ---------- | ---------- |
-| [resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 40.2     | 60.5      | 43.9      | 36.3      | 57.5       | 38.8       |
-
-### Segmentation
-
-The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes** and **Pascal VOC 2012 + Aug**. It follows the evluation protocols set up by MMSegmentation.
-
-#### Pascal VOC 2012 + Aug
-
-Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
-
-| Self-Supervised Config                                                                                                                                                              | mIOU  |
-| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| [resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 63.73 |
-
-## Citation
-
-```bibtex
-@article{caron2020unsupervised,
-  title={Unsupervised Learning of Visual Features by Contrasting Cluster Assignments},
-  author={Caron, Mathilde and Misra, Ishan and Mairal, Julien and Goyal, Priya and Bojanowski, Piotr and Joulin, Armand},
-  booktitle={NeurIPS},
-  year={2020}
-}
-```
--- a/docs/en/api.rst
+++ b/docs/en/api.rst
@ -1,33 +1,14 @@
-mmselfsup.apis
---------------
-.. automodule:: mmselfsup.apis
-    :members:
-
-mmselfsup.core
---------------
-
-hooks
-^^^^^^^^^^
-.. automodule:: mmselfsup.core.hooks
-    :members:
-
-optimizer
-^^^^^^^^^^
-.. automodule:: mmselfsup.core.optimizer
-    :members:
-
-
 mmselfsup.datasets
---------------------
+---------------

-data_sources
-^^^^^^^^^^^^^
-.. automodule:: mmselfsup.datasets.data_sources
+datasets
+^^^^^^^^^^
+.. automodule:: mmselfsup.datasets
    :members:

-pipelines
+transforms
 ^^^^^^^^^^
-.. automodule:: mmselfsup.datasets.pipelines
+.. automodule:: mmselfsup.datasets.transforms
    :members:

 samplers
@ -35,11 +16,26 @@ samplers
 .. automodule:: mmselfsup.datasets.samplers
    :members:

-datasets
+mmselfsup.engine
+--------------
+
+hooks
 ^^^^^^^^^^
-.. automodule:: mmselfsup.datasets
+.. automodule:: mmselfsup.engine.hooks
    :members:

+optimizers
+^^^^^^^^^^
+.. automodule:: mmselfsup.engine.optimizers
+    :members:
+
+mmselfsup.evaluation
+-----------------
+
+functional
+^^^^^^^^^^
+.. automodule:: mmselfsup.evaluation.functional
+    :members:

 mmselfsup.models
 -----------------
@ -54,26 +50,40 @@ backbones
 .. automodule:: mmselfsup.models.backbones
    :members:

+necks
+^^^^^^^^^^
+.. automodule:: mmselfsup.models.necks
+    :members:
+
 heads
 ^^^^^^^^^^
 .. automodule:: mmselfsup.models.heads
    :members:

+losses
+^^^^^^^^^^
+.. automodule:: mmselfsup.models.losses
+    :members:
+
 memories
 ^^^^^^^^^^
 .. automodule:: mmselfsup.models.memories
    :members:

-necks
-^^^^^^^^^^
-.. automodule:: mmselfsup.models.necks
-    :members:
-
 utils
 ^^^^^^^^^^
 .. automodule:: mmselfsup.models.utils
    :members:

+mmselfsup.structures
+---------------
+.. automodule:: mmselfsup.structures
+    :members:
+
+mmselfsup.visualization
+---------------
+.. automodule:: mmselfsup.visualization
+    :members:

 mmselfsup.utils
 ---------------
--- a/docs/en/get_started.md
+++ b/docs/en/get_started.md
@ -1,171 +1,228 @@
-# Getting Started
+# Get Started

- [Getting Started](#getting-started)
-  - [Train existing methods](#train-existing-methods)
-    - [Training with CPU](#training-with-cpu)
-    - [Train with single/multiple GPUs](#train-with-singlemultiple-gpus)
-    - [Train with multiple machines](#train-with-multiple-machines)
-    - [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
-  - [Benchmarks](#benchmarks)
-  - [Tools and Tips](#tools-and-tips)
-    - [Count number of parameters](#count-number-of-parameters)
-    - [Publish a model](#publish-a-model)
-    - [Use t-SNE](#use-t-sne)
-    - [Reproducibility](#reproducibility)
+- [Get Started](#get-started)
+  - [Prerequisites](#prerequisites)
+  - [Installation](#installation)
+    - [Best Practices](#best-practices)
+    - [Verify the installation](#verify-the-installation)
+    - [Customized installation](#customized-installation)
+      - [Benchmark](#benchmark)
+      - [CUDA versions](#cuda-versions)
+      - [Install MMCV without MIM](#install-mmcv-without-mim)
+      - [Another option: Docker Image](#another-option-docker-image)
+      - [Install on Google Colab](#install-on-google-colab)
+    - [Trouble shooting](#trouble-shooting)
+  - [Using multiple MMSelfSup versions](#using-multiple-mmselfsup-versions)

-This page provides basic tutorials about the usage of MMSelfSup. For installation instructions, please see [install.md](install.md).
+## Prerequisites

-## Train existing methods
+In this section we demonstrate how to prepare an environment with PyTorch.

-**Note**: The default learning rate in config files is for 8 GPUs. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
+MMselfSup works on Linux (Windows and macOS are not officially supported). It requires Python 3.6+, CUDA 9.2+ and PyTorch 1.5+.

-### Training with CPU
+If you are experienced with PyTorch and have already installed it, just skip this part and jump to the [next section](#installation). Otherwise, you can follow these steps for the preparation.
+
+**Step 0.** Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).
+
+**Step 1.** Create a conda environment and activate it.

 ```shell
-export CUDA_VISIBLE_DEVICES=-1
-python tools/train.py ${CONFIG_FILE}
+conda create --name openmmlab python=3.8 -y
+conda activate openmmlab
 ```

-**Note**: We do not recommend users to use CPU for training because it is too slow and some algorithms are using `SyncBN` which is based on distributed training. We support this feature to allow users to debug on machines without GPU for convenience.
+**Step 2.** Install PyTorch following [official instructions](https://pytorch.org/get-started/locally/), e.g.

-### Train with single/multiple GPUs
+On GPU platforms:

 ```shell
-sh tools/dist_train.sh ${CONFIG_FILE} ${GPUS} --work-dir ${YOUR_WORK_DIR} [optional arguments]
+conda install pytorch torchvision -c pytorch
 ```

-Optional arguments are:
-
- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
- `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
-
-An example:
+On CPU platforms:

 ```shell
-# checkpoints and logs saved in WORK_DIR=work_dirs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k/
-sh tools/dist_train.sh configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py 8 --work-dir work_dirs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k/
+conda install pytorch torchvision cpuonly -c pytorch
 ```

-**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
+## Installation
+
+We recommend that users follow our best practices to install MMSelfSup. However, the whole process is highly customizable. See [Customize Installation](#customized-installation) section for more information.
+
+### Best Practices
+
+**Step 0.** Install [MMCV](https://github.com/open-mmlab/mmcv) using [MIM](https://github.com/open-mmlab/mim).

 ```shell
-ln -s ${YOUR_WORK_DIRS} ${MMSELFSUP}/work_dirs
+pip install -U openmim
+mim install mmcv-full
 ```

-Alternatively, if you run MMSelfSup on a cluster managed with [slurm](https://slurm.schedmd.com/):
+**Step 1.** Install MMSelfSup.
+
+Case a: If you develop and run mmselfsup directly, install it from source:

 ```shell
-GPUS_PER_NODE=${GPUS_PER_NODE} GPUS=${GPUS} SRUN_ARGS=${SRUN_ARGS} sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${YOUR_WORK_DIR} [optional arguments]
+git clone https://github.com/open-mmlab/mmselfsup.git
+cd mmselfsup
+pip install -v -e .
+# "-v" means verbose, or more output
+# "-e" means installing a project in editable mode,
+# thus any local modifications made to the code will take effect without reinstallation.
 ```

-An example:
+Case b: If you use mmselfsup as a dependency or third-party package, install it with pip:

 ```shell
-GPUS_PER_NODE=8 GPUS=8 sh tools/slurm_train.sh Dummy Test_job configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py work_dirs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k/
+pip install mmselfsup
 ```

-### Train with multiple machines
+### Verify the installation

-If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
-
-On the first machine:
-
-```shell
-NNODES=2 NODE_RANK=0 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
-```
-
-On the second machine:
-
-```shell
-NNODES=2 NODE_RANK=1 PORT=$MASTER_PORT MASTER_ADDR=$MASTER_ADDR sh tools/dist_train.sh $CONFIG $GPUS
-```
-
-Usually it is slow if you do not have high speed networking like InfiniBand.
-
-If you launch with slurm, the command is the same as that on single machine described above, but you need refer to [slurm_train.sh](https://github.com/open-mmlab/mmselfsup/blob/master/tools/slurm_train.sh) to set appropriate parameters and environment variables.
-
-### Launch multiple jobs on a single machine
-
-If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.
-
-If you use `dist_train.sh` to launch training jobs:
-
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 sh tools/dist_train.sh ${CONFIG_FILE} 4 --work-dir tmp_work_dir_1
-CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 sh tools/dist_train.sh ${CONFIG_FILE} 4 --work-dir tmp_work_dir_2
-```
-
-If you use launch training jobs with slurm, you have two options to set different communication ports:
-
-Option 1:
-
-In `config1.py`:
+To verify whether MMSelfSup is installed correctly, we can run the following sample code to initialize a model and inference a demo image.

 ```python
-dist_params = dict(backend='nccl', port=29500)
+import torch
+
+from mmselfsup.models import build_algorithm
+
+model_config = dict(
+    type='Classification',
+    backbone=dict(
+        type='ResNet',
+        depth=50,
+        in_channels=3,
+        num_stages=4,
+        strides=(1, 2, 2, 2),
+        dilations=(1, 1, 1, 1),
+        out_indices=[4],  # 0: conv-1, x: stage-x
+        norm_cfg=dict(type='BN'),
+        frozen_stages=-1),
+    head=dict(
+        type='ClsHead', with_avg_pool=True, in_channels=2048,
+        num_classes=1000))
+
+model = build_algorithm(model_config).cuda()
+
+image = torch.randn((1, 3, 224, 224)).cuda()
+label = torch.tensor([1]).cuda()
+
+loss = model.forward_train(image, label)
 ```

-In `config2.py`:
+The above code is supposed to run successfully upon you finish the installation.
+
+### Customized installation
+
+#### Benchmark
+
+The [Best Practices](#best-practices) is for basic usage, if you need to evaluate your pre-training model with some downstream tasks such as detection or segmentation, please also install [MMDetection](https://github.com/open-mmlab/mmdetection) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation).
+
+If you don't run MMDetection and MMSegmentation benchmark, it is unnecessary to install them.
+
+You can simply install MMDetection and MMSegmentation with the following command:
+
+```shell
+pip install mmdet mmsegmentation
+```
+
+For more details, you can check the installation page of [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/get_started.md) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/get_started.md).
+
+#### CUDA versions
+
+When installing PyTorch, you need to specify the version of CUDA. If you are not clear on which to choose, follow our recommendations:
+
+- For Ampere-based NVIDIA GPUs, such as GeForce 30 series and NVIDIA A100, CUDA 11 is a must.
+- For older NVIDIA GPUs, CUDA 11 is backward compatible, but CUDA 10.2 offers better compatibility and is more lightweight.
+
+Please make sure the GPU driver satisfies the minimum version requirements. See [this table](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions) for more information.
+
+```{note}
+Installing CUDA runtime libraries is enough if you follow our best practices, because no CUDA code will be compiled locally. However if you hope to compile MMCV from source or develop other CUDA operators, you need to install the complete CUDA toolkit from NVIDIA's [website](https://developer.nvidia.com/cuda-downloads), and its version should match the CUDA version of PyTorch. i.e., the specified version of cudatoolkit in `conda install` command.
+```
+
+#### Install MMCV without MIM
+
+MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier. However, it is not a must.
+
+To install MMCV with pip instead of MIM, please follow [MMCV installation guides](https://mmcv.readthedocs.io/en/latest/get_started/installation.html). This requires manually specifying a find-url based on PyTorch version and its CUDA version.
+
+For example, the following command install mmcv-full built for PyTorch 1.10.x and CUDA 11.3.
+
+```shell
+pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10/index.html
+```
+
+#### Another option: Docker Image
+
+We provide a [Dockerfile](/docker/Dockerfile) to build an image.
+
+```shell
+# build an image with PyTorch 1.6.0, CUDA 10.1, CUDNN 7.
+docker build -f ./docker/Dockerfile --rm -t mmselfsup:torch1.10.0-cuda11.3-cudnn8 .
+```
+
+**Important:** Make sure you've installed the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
+
+Run the following cmd:
+
+```shell
+docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/workspace/mmselfsup/data mmselfsup:torch1.10.0-cuda11.3-cudnn8 /bin/bash
+```
+
+`{DATA_DIR}` is your local folder containing all these datasets.
+
+#### Install on Google Colab
+
+[Google Colab](https://research.google.com/) usually has PyTorch installed,
+thus we only need to install MMCV and MMSeflSup with the following commands.
+
+**Step 0.** Install [MMCV](https://github.com/open-mmlab/mmcv) using [MIM](https://github.com/open-mmlab/mim).
+
+```shell
+!pip3 install openmim
+!mim install mmcv-full
+```
+
+**Step 1.** Install MMSelfSup from the source.
+
+```shell
+!git clone https://github.com/open-mmlab/mmselfsup.git
+%cd mmselfsup
+!pip install -e .
+```
+
+**Step 2.** Verification.

 ```python
-dist_params = dict(backend='nccl', port=29501)
+import mmselfsup
+print(mmselfsup.__version__)
+# Example output: 0.9.0
 ```

-Then you can launch two jobs with config1.py and config2.py.
+```{note}
+Within Jupyter, the exclamation mark `!` is used to call external executables and `%cd` is a [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd) to change the current working directory of Python.
+```
+
+### Trouble shooting
+
+If you have some issues during the installation, please first view the [FAQ](faq.md) page.
+You may [open an issue](https://github.com/open-mmlab/mmselfsup/issues/new/choose) on GitHub if no solution is found.
+
+## Using multiple MMSelfSup versions
+
+If there are more than one mmselfsup on your machine, and you want to use them alternatively, the recommended way is to create multiple conda environments and use different environments for different versions.
+
+Another way is to insert the following code to the main scripts (`train.py`, `test.py` or any other scripts you run)
+
+```python
+import os.path as osp
+import sys
+sys.path.insert(0, osp.join(osp.dirname(osp.abspath(__file__)), '../'))
+```
+
+Or run the following command in the terminal of corresponding root folder to temporally use the current one.

 ```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1
-CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2
+export PYTHONPATH="$(pwd)":$PYTHONPATH
 ```
-
-Option 2:
-
-You can set different communication ports without the need to modify the configuration file, but have to set the `cfg-options` to overwrite the default port in configuration file.
-
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1 --cfg-options dist_params.port=29500
-CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2 --cfg-options dist_params.port=29501
-```
-
-## Benchmarks
-
-We also provide commands to evaluate your pre-trained model on several downstream task, and you can refer to [Benchmarks](./tutorials/6_benchmarks.md) for the details.
-
-## Tools and Tips
-
-### Count number of parameters
-
-```shell
-python tools/analysis_tools/count_parameters.py ${CONFIG_FILE}
-```
-
-### Publish a model
-
-Before you publish a model, you may want to
-
- Convert model weights to CPU tensors.
- Delete the optimizer states.
- Compute the hash of the checkpoint file and append the hash id to the filename.
-
-```shell
-python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
-```
-
-### Use t-SNE
-
-We provide an off-the-shelf tool to visualize the quality of image representations by t-SNE.
-
-```shell
-python tools/analysis_tools/visualize_tsne.py ${CONFIG_FILE} --checkpoint ${CKPT_PATH} --work-dir ${WORK_DIR} [optional arguments]
-```
-
-Arguments:
-
- `CONFIG_FILE`: config file for the pre-trained model.
- `CKPT_PATH`: the path of model's checkpoint.
- `WORK_DIR`: the directory to save the results of visualization.
- `[optional arguments]`: for optional arguments, you can refer to [visualize_tsne.py](https://github.com/open-mmlab/mmselfsup/blob/master/tools/analysis_tools/visualize_tsne.py)
-
-### Reproducibility
-
-If you want to make your performance exactly reproducible, please switch on `--deterministic` to train the final model to be published. Note that this flag will switch off `torch.backends.cudnn.benchmark` and slow down the training speed.
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@ -10,67 +10,54 @@ Welcome to MMSelfSup's documentation!
   :maxdepth: 1
   :caption: Get Started

-   install.md
-   prepare_data.md
+   overview.md
   get_started.md
+
+.. toctree::
+   :maxdepth: 2
+   :caption: User Guides
+
+   user_guides/index.rst
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Advanced Guides
+
+   advanced_guides/index.rst
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Model Zoo
+
   model_zoo.md

 .. toctree::
   :maxdepth: 1
-   :caption: Tutorials
+   :caption: Migration

-   tutorials/0_config.md
-   tutorials/1_new_dataset.md
-   tutorials/2_data_pipeline.md
-   tutorials/3_new_module.md
-   tutorials/4_schedule.md
-   tutorials/5_runtime.md
-   tutorials/6_benchmarks.md
-
-.. toctree::
-   :maxdepth: 1
-   :caption: Algorithms
-
-   algorithms/byol.md
-   algorithms/deep.md
-   algorithms/dense.md
-   algorithms/moco.md
-   algorithms/npid.md
-   algorithms/odc.md
-   algorithms/rl.md
-   algorithms/rp.md
-   algorithms/simclr.md
-   algorithms/ss.md
-   algorithms/swav.md
-   algorithms/mocov3.md
-   algorithms/mae.md
-   algorithms/simmim.md
-   algorithms/barlowtwins.md
-   algorithms/cae.md
-
-
-.. toctree::
-   :maxdepth: 1
-   :caption: Notes
-
-   changelog.md
-   compatibility.md
-
-.. toctree::
-   :caption: Switch Language
-
-   switch_language.md
+   migration.md

 .. toctree::
   :caption: API Reference

   api.rst

+.. toctree::
+   :maxdepth: 1
+   :caption: Notes

+   contribution_guide.md
+   changelog.md
+   faq.md

+.. toctree::
+   :caption: Switch Language
+
+   switch_language.md

 Indices and tables
 ==================

 * :ref:`genindex`
+* :ref:`modindex`
 * :ref:`search`
--- a/docs/en/install.md
+++ b/docs/en/install.md
@ -1,212 +0,0 @@
-# Prerequisites
-
-In this section we demonstrate how to prepare an environment with PyTorch.
-
-MMselfSup works on Linux (Windows and macOS are not officially supported). It requires Python 3.6+, CUDA 9.2+ and PyTorch 1.5+.
-
-If you are experienced with PyTorch and have already installed it, just skip this part and jump to the [next section](#installation). Otherwise, you can follow these steps for the preparation.
-
-**Step 0.** Download and install Miniconda from the [official website](https://docs.conda.io/en/latest/miniconda.html).
-
-**Step 1.** Create a conda environment and activate it.
-
-```shell
-conda create --name openmmlab python=3.8 -y
-conda activate openmmlab
-```
-
-**Step 2.** Install PyTorch following [official instructions](https://pytorch.org/get-started/locally/), e.g.
-
-On GPU platforms:
-
-```shell
-conda install pytorch torchvision -c pytorch
-```
-
-On CPU platforms:
-
-```shell
-conda install pytorch torchvision cpuonly -c pytorch
-```
-
-# Installation
-
-We recommend that users follow our best practices to install MMSelfSup. However, the whole process is highly customizable. See [Customize Installation](#customized-installation) section for more information.
-
-## Best Practices
-
-**Step 0.** Install [MMCV](https://github.com/open-mmlab/mmcv) using [MIM](https://github.com/open-mmlab/mim).
-
-```shell
-pip install -U openmim
-mim install mmcv-full
-```
-
-**Step 1.** Install MMSelfSup.
-
-Case a: If you develop and run mmselfsup directly, install it from source:
-
-```shell
-git clone https://github.com/open-mmlab/mmselfsup.git
-cd mmselfsup
-pip install -v -e .
-# "-v" means verbose, or more output
-# "-e" means installing a project in editable mode,
-# thus any local modifications made to the code will take effect without reinstallation.
-```
-
-Case b: If you use mmselfsup as a dependency or third-party package, install it with pip:
-
-```shell
-pip install mmselfsup
-```
-
-## Verify the installation
-
-To verify whether MMSelfSup is installed correctly, we can run the following sample code to initialize a model and inference a demo image.
-
-```python
-import torch
-
-from mmselfsup.models import build_algorithm
-
-model_config = dict(
-    type='Classification',
-    backbone=dict(
-        type='ResNet',
-        depth=50,
-        in_channels=3,
-        num_stages=4,
-        strides=(1, 2, 2, 2),
-        dilations=(1, 1, 1, 1),
-        out_indices=[4],  # 0: conv-1, x: stage-x
-        norm_cfg=dict(type='BN'),
-        frozen_stages=-1),
-    head=dict(
-        type='ClsHead', with_avg_pool=True, in_channels=2048,
-        num_classes=1000))
-
-model = build_algorithm(model_config).cuda()
-
-image = torch.randn((1, 3, 224, 224)).cuda()
-label = torch.tensor([1]).cuda()
-
-loss = model.forward_train(image, label)
-```
-
-The above code is supposed to run successfully upon you finish the installation.
-
-## Customized installation
-
-### Benchmark
-
-The [Best Practices](#best-practices) is for basic usage, if you need to evaluate your pre-training model with some downstream tasks such as detection or segmentation, please also install [MMDetection](https://github.com/open-mmlab/mmdetection) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation).
-
-If you don't run MMDetection and MMSegmentation benchmark, it is unnecessary to install them.
-
-You can simply install MMDetection and MMSegmentation with the following command:
-
-```shell
-pip install mmdet mmsegmentation
-```
-
-For more details, you can check the installation page of [MMDetection](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/get_started.md) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/en/get_started.md).
-
-### CUDA versions
-
-When installing PyTorch, you need to specify the version of CUDA. If you are not clear on which to choose, follow our recommendations:
-
- For Ampere-based NVIDIA GPUs, such as GeForce 30 series and NVIDIA A100, CUDA 11 is a must.
- For older NVIDIA GPUs, CUDA 11 is backward compatible, but CUDA 10.2 offers better compatibility and is more lightweight.
-
-Please make sure the GPU driver satisfies the minimum version requirements. See [this table](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions) for more information.
-
-```{note}
-Installing CUDA runtime libraries is enough if you follow our best practices, because no CUDA code will be compiled locally. However if you hope to compile MMCV from source or develop other CUDA operators, you need to install the complete CUDA toolkit from NVIDIA's [website](https://developer.nvidia.com/cuda-downloads), and its version should match the CUDA version of PyTorch. i.e., the specified version of cudatoolkit in `conda install` command.
-```
-
-### Install MMCV without MIM
-
-MMCV contains C++ and CUDA extensions, thus depending on PyTorch in a complex way. MIM solves such dependencies automatically and makes the installation easier. However, it is not a must.
-
-To install MMCV with pip instead of MIM, please follow [MMCV installation guides](https://mmcv.readthedocs.io/en/latest/get_started/installation.html). This requires manually specifying a find-url based on PyTorch version and its CUDA version.
-
-For example, the following command install mmcv-full built for PyTorch 1.10.x and CUDA 11.3.
-
-```shell
-pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10/index.html
-```
-
-### Another option: Docker Image
-
-We provide a [Dockerfile](/docker/Dockerfile) to build an image.
-
-```shell
-# build an image with PyTorch 1.6.0, CUDA 10.1, CUDNN 7.
-docker build -f ./docker/Dockerfile --rm -t mmselfsup:torch1.10.0-cuda11.3-cudnn8 .
-```
-
-**Important:** Make sure you've installed the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker).
-
-Run the following cmd:
-
-```shell
-docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/workspace/mmselfsup/data mmselfsup:torch1.10.0-cuda11.3-cudnn8 /bin/bash
-```
-
-`{DATA_DIR}` is your local folder containing all these datasets.
-
-### Install on Google Colab
-
-[Google Colab](https://research.google.com/) usually has PyTorch installed,
-thus we only need to install MMCV and MMSeflSup with the following commands.
-
-**Step 0.** Install [MMCV](https://github.com/open-mmlab/mmcv) using [MIM](https://github.com/open-mmlab/mim).
-
-```shell
-!pip3 install openmim
-!mim install mmcv-full
-```
-
-**Step 1.** Install MMSelfSup from the source.
-
-```shell
-!git clone https://github.com/open-mmlab/mmselfsup.git
-%cd mmselfsup
-!pip install -e .
-```
-
-**Step 2.** Verification.
-
-```python
-import mmselfsup
-print(mmselfsup.__version__)
-# Example output: 0.9.0
-```
-
-```{note}
-Within Jupyter, the exclamation mark `!` is used to call external executables and `%cd` is a [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd) to change the current working directory of Python.
-```
-
-## Trouble shooting
-
-If you have some issues during the installation, please first view the [FAQ](faq.md) page.
-You may [open an issue](https://github.com/open-mmlab/mmselfsup/issues/new/choose) on GitHub if no solution is found.
-
-# Using multiple MMSelfSup versions
-
-If there are more than one mmselfsup on your machine, and you want to use them alternatively, the recommended way is to create multiple conda environments and use different environments for different versions.
-
-Another way is to insert the following code to the main scripts (`train.py`, `test.py` or any other scripts you run)
-
-```python
-import os.path as osp
-import sys
-sys.path.insert(0, osp.join(osp.dirname(osp.abspath(__file__)), '../'))
-```
-
-Or run the following command in the terminal of corresponding root folder to temporally use the current one.
-
-```shell
-export PYTHONPATH="$(pwd)":$PYTHONPATH
-```
--- a/docs/en/compatibility.md
+++ b/docs/en/compatibility.md
@ -1,4 +1,18 @@
-# Differences between MMSelfSup and OpenSelfSup
+# Migration
+
+- [Migration](#migration)
+  - [Migration from MMSelfSup 0.x](#migration-from-mmselfsup-0x)
+  - [Differences between MMSelfSup and OpenSelfSup](#differences-between-mmselfsup-and-openselfsup)
+    - [Modular Design](#modular-design)
+      - [Datasets](#datasets)
+      - [Models](#models)
+    - [Codebase Conventions](#codebase-conventions)
+      - [Configs](#configs)
+      - [Scripts](#scripts)
+
+## Migration from MMSelfSup 0.x
+
+## Differences between MMSelfSup and OpenSelfSup

 This file records differences between the newest version of MMSelfSup with older versions and OpenSelfSup.

@ -6,11 +20,11 @@ MMSelfSup goes through a refactoring and addresses many legacy issues. It is not

 The major differences are in two folds: codebase conventions, modular design.

-## Modular Design
+### Modular Design

 In order to build more clear directory structure, MMSelfSup redesigns some of the modules.

-### Datasets
+#### Datasets

 - MMSelfSup merges some datasets to reduce some redundant codes.

@ -22,7 +36,7 @@ In order to build more clear directory structure, MMSelfSup redesigns some of th

 In addition, this part is still under refactoring, it will be released in following version.

-### Models
+#### Models

 - The registry mechanism is updated. Currently, the parts under the `models` folder are built with a parent called `MMCV_MODELS` that is imported from `MMCV`. Please check [mmselfsup/models/builder.py](https://github.com/open-mmlab/mmselfsup/blob/master/mmselfsup/models/builder.py) and refer to [mmcv/utils/registry.py](https://github.com/open-mmlab/mmcv/blob/master/mmcv/utils/registry.py) for more details.

@ -30,11 +44,11 @@ In addition, this part is still under refactoring, it will be released in follow

 - In OpenSelfSup, the names of `necks` are kind of confused and all in one file. Now, the `necks` are refactored, managed with one folder and renamed for easier understanding. Please check `mmselfsup/models/necks` for more details.

-## Codebase Conventions
+### Codebase Conventions

 MMSelfSup renews codebase conventions as OpenSelfSup has not been updated for some time.

-### Configs
+#### Configs

 - MMSelfSup renames all config files to use new name convention. Please refer to [0_config](./tutorials/0_config.md) for more details.

@ -48,7 +62,7 @@ MMSelfSup renews codebase conventions as OpenSelfSup has not been updated for so

  - The normalization layers are all built with arguments `norm_cfg`.

-### Scripts
+#### Scripts

 - The directory of `tools` is modified, thus it has more clear structure. It has several folders to manage different scripts. For example, it has two converter folders for models and data format. Besides, the benchmark related scripts are all in `benchmarks` folder, which has the same structure as `configs/benchmarks`.

--- a/docs/en/model_zoo.md
+++ b/docs/en/model_zoo.md
@ -2,7 +2,17 @@

 All models and part of benchmark results are recorded below.

-## Pre-trained models
+- [Model Zoo](#model-zoo)
+  - [Statistics](#statistics)
+  - [Benchmarks](#benchmarks)
+    - [ImageNet Linear Evaluation](#imagenet-linear-evaluation)
+    - [ImageNet Fine-tuning](#imagenet-fine-tuning)
+
+## Statistics
+
+- Number of papers: 17
+
+- Number of checkpoints: xx ckpts

 | Algorithm                                                                                                          | Config                                                                                                                                                                                       | Download                                                                                                                                                                                                                                                                  |
 | ------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@ -68,36 +78,3 @@ If not specified, we use linear evaluation setting from [MoCo](http://openaccess
 | MAE       | [mae_vit-base-p16_8xb512-coslr-400e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mae/mae_vit-base-p16_8xb512-coslr-400e_in1k.py)              |         | 83.1      |
 | SimMIM    | [simmim_swin-base_16xb128-coslr-100e_in1k-192](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simmim/simmim_swin-base_16xb128-coslr-100e_in1k-192.py) |         | 82.9      |
 | CAE       | [cae_vit-base-p16_8xb256-fp16-coslr-300e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/cae/cae_vit-base-p16_8xb256-fp16-coslr-300e_in1k.py)    |         | 83.2      |
-
-### COCO17 Object Detection and Instance Segmentation
-
-In COCO17 object detection and instance segmentation task, we choose the evaluation protocol from [MoCo](http://openaccess.thecvf.com/content_CVPR_2020/papers/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.pdf), with Mask-RCNN FPN architecture. The results below are fine-tuned with the same [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmdetection/coco/mask_rcnn_r50_fpn_mstrain_1x_coco.py).
-
-| Algorithm           | Config                                                                                                                                                                                   | mAP (Box) | mAP (Mask) |
-| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------- | ---------- |
-| Relative Location   | [relative-loc_resnet50_8xb64-steplr-70e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py)           | 37.5      | 33.7       |
-| Rotation Prediction | [rotation-pred_resnet50_8xb16-steplr-70e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py)        | 37.9      | 34.2       |
-| NPID                | [npid_resnet50_8xb32-steplr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py)                                 | 38.5      | 34.6       |
-| SimCLR              | [simclr_resnet50_8xb32-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py)                             | 38.7      | 34.9       |
-| MoCo v2             | [mocov2_resnet50_8xb32-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py)                             | 40.2      | 36.1       |
-| BYOL                | [byol_resnet50_8xb32-accum16-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py)                   | 40.9      | 36.8       |
-| SwAV                | [swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 40.2      | 36.3       |
-| SimSiam             | [simsiam_resnet50_8xb32-coslr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py)                          | 38.6      | 34.6       |
-|                     | [simsiam_resnet50_8xb32-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py)                          | 38.8      | 34.9       |
-
-### Pascal VOC12 Aug Semantic Segmentation
-
-In Pascal VOC12 Aug semantic segmentation task, we choose the evaluation protocol from [MMSeg](https://github.com/open-mmlab/mmsegmentation), with FCN architecture. The results below are fine-tuned with the same [config](https://github.com/open-mmlab/mmselfsup/blob/master/configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py).
-
-| Algorithm           | Config                                                                                                                                                                                   | mIOU  |
-| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
-| Relative Location   | [relative-loc_resnet50_8xb64-steplr-70e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/relative_loc/relative-loc_resnet50_8xb64-steplr-70e_in1k.py)           | 63.49 |
-| Rotation Prediction | [rotation-pred_resnet50_8xb16-steplr-70e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/rotation_pred/rotation-pred_resnet50_8xb16-steplr-70e_in1k.py)        | 64.31 |
-| NPID                | [npid_resnet50_8xb32-steplr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/npid/npid_resnet50_8xb32-steplr-200e_in1k.py)                                 | 65.45 |
-| SimCLR              | [simclr_resnet50_8xb32-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simclr/simclr_resnet50_8xb32-coslr-200e_in1k.py)                             | 64.03 |
-| MoCo v2             | [mocov2_resnet50_8xb32-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/mocov2/mocov2_resnet50_8xb32-coslr-200e_in1k.py)                             | 67.55 |
-| BYOL                | [byol_resnet50_8xb32-accum16-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/byol/byol_resnet50_8xb32-accum16-coslr-200e_in1k.py)                   | 67.16 |
-| SwAV                | [swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/swav/swav_resnet50_8xb32-mcrop-2-6-coslr-200e_in1k-224-96.py) | 63.73 |
-| DenseCL             | [densecl_resnet50_8xb32-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/densecl/densecl_resnet50_8xb32-coslr-200e_in1k.py)                          | 69.47 |
-| SimSiam             | [simsiam_resnet50_8xb32-coslr-100e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-100e_in1k.py)                          | 48.35 |
-|                     | [simsiam_resnet50_8xb32-coslr-200e_in1k](https://github.com/open-mmlab/mmselfsup/blob/master/configs/selfsup/simsiam/simsiam_resnet50_8xb32-coslr-200e_in1k.py)                          | 46.27 |
--- a/docs/en/notes/changelog.md
+++ b/docs/en/notes/changelog.md
--- a/docs/en/notes/contribution_guides.md
+++ b/docs/en/notes/contribution_guides.md
--- a/docs/en/notes/faq.md
+++ b/docs/en/notes/faq.md
@ -2,6 +2,9 @@

 We list some common troubles faced by many users and their corresponding solutions here. Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them. If the contents here do not cover your issue, please create an issue using the [provided templates](https://github.com/open-mmlab/mmselfsup/tree/master/.github/ISSUE_TEMPLATE) and make sure you fill in all required information in the template.

+- [FAQ](#faq)
+  - [Installation](#installation)
+
 ## Installation

 Compatible MMCV, MMClassification, MMDetection and MMSegmentation versions are shown below. Please install the correct version of them to avoid installation issues.
--- a/docs/en/overview.md
+++ b/docs/en/overview.md
@ -0,0 +1,3 @@
+# Overview
+
+- [Overview](#overview)
--- a/docs/en/tutorials/4_schedule.md
+++ b/docs/en/tutorials/4_schedule.md
@ -1,210 +0,0 @@
-# Tutorial 4: Customize Schedule
-
- [Tutorial 4: Customize Schedule](#tutorial-4-customize-schedule)
-  - [Customize optimizer supported by Pytorch](#customize-optimizer-supported-by-pytorch)
-  - [Customize learning rate schedules](#customize-learning-rate-schedules)
-    - [Learning rate decay](#learning-rate-decay)
-    - [Warmup strategy](#warmup-strategy)
-    - [Customize momentum schedules](#customize-momentum-schedules)
-    - [Parameter-wise configuration](#parameter-wise-configuration)
-  - [Gradient clipping and gradient accumulation](#gradient-clipping-and-gradient-accumulation)
-    - [Gradient clipping](#gradient-clipping)
-    - [Gradient accumulation](#gradient-accumulation)
-  - [Customize self-implemented optimizer](#customize-self-implemented-optimizer)
-
-In this tutorial, we will introduce some methods about how to construct optimizers, customize learning rate, momentum schedules, parameter-wise configuration, gradient clipping, gradient accumulation, and customize self-implemented methods for the project.
-
-## Customize optimizer supported by Pytorch
-
-We already support to use all the optimizers implemented by PyTorch, and to use and modify them, please change the `optimizer` field of config files.
-
-For example, if you want to use SGD, the modification could be as the following.
-
-```python
-optimizer = dict(type='SGD', lr=0.0003, weight_decay=0.0001)
-```
-
-To modify the learning rate of the model, just modify the `lr` in the config of optimizer. You can also directly set other arguments according to the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch.
-
-For example, if you want to use `Adam` with the setting like `torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` in PyTorch, the config should looks like:
-
-```python
-optimizer = dict(type='Adam', lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)
-```
-
-In addition to optimizers implemented by PyTorch, we also implement a customized [LARS](https://arxiv.org/abs/1708.03888) in `mmselfsup/core/optimizer/optimizers.py`
-
-## Customize learning rate schedules
-
-### Learning rate decay
-
-Learning rate decay is widely used to improve performance. And to use learning rate decay, please set the `lr_confg` field in config files.
-
-For example, we use CosineAnnealing policy to train SimCLR, and the config is:
-
-```python
-lr_config = dict(
-    policy='CosineAnnealing',
-    ...)
-```
-
-Then during training, the program will call [CosineAnealingLrUpdaterHook](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L227) periodically to update the learning rate.
-
-We also support many other learning rate schedules [here](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py), such as Poly schedule.
-
-### Warmup strategy
-
-In the early stage, training is easy to be volatile, and warmup is a technique to reduce volatility. With warmup, the learning rate will increase gradually from a small value to the expected value.
-
-In MMSelfSup, we use `lr_config` to configure the warmup strategy, the main parameters are as follows：
-
- `warmup`: The warmup curve type. Please choose one from 'constant', 'linear', 'exp' and `None`, and `None` means disable warmup.
- `warmup_by_epoch` : whether warmup by epoch or not, default to be True, if set to be False, warmup by iter.
- `warmup_iters` : the number of warm-up iterations, when `warmup_by_epoch=True`, the unit is epoch; when `warmup_by_epoch=False`, the unit is the number of iterations (iter).
- `warmup_ratio` : warm-up initial learning rate will calculate as `lr = lr * warmup_ratio`.
-
-Here are some examples:
-
-1.linear & warmup by iter
-
-```python
-lr_config = dict(
-    policy='CosineAnnealing',
-    by_epoch=False,
-    min_lr_ratio=1e-2,
-    warmup='linear',
-    warmup_ratio=1e-3,
-    warmup_iters=20 * 1252,
-    warmup_by_epoch=False)
-```
-
-2.exp & warmup by epoch
-
-```python
-lr_config = dict(
-    policy='CosineAnnealing',
-    min_lr=0,
-    warmup='exp',
-    warmup_iters=5,
-    warmup_ratio=0.1,
-    warmup_by_epoch=True)
-```
-
-### Customize momentum schedules
-
-We support the momentum scheduler to modify the model's momentum according to learning rate, which could make the model converge in a faster way.
-
-Momentum scheduler is usually used with LR scheduler, for example, the following config is used to accelerate convergence. For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130).
-
-Here is an example:
-
-```python
-lr_config = dict(
-    policy='cyclic',
-    target_ratio=(10, 1e-4),
-    cyclic_times=1,
-    step_ratio_up=0.4,
-)
-momentum_config = dict(
-    policy='cyclic',
-    target_ratio=(0.85 / 0.95, 1),
-    cyclic_times=1,
-    step_ratio_up=0.4,
-)
-```
-
-### Parameter-wise configuration
-
-Some models may have some parameter-specific settings for optimization, for example, no weight decay to the BatchNorm layer and the bias in each layer. To finely configure them, we can use the `paramwise_options` in optimizer.
-
-For example, if we do not want to apply weight decay to the parameters of BatchNorm or GroupNorm, and the bias in each layer, we can use following config file:
-
-```python
-optimizer = dict(
-    type=...,
-    lr=...,
-    paramwise_options={
-        '(bn|gn)(\\d+)?.(weight|bias)':
-        dict(weight_decay=0.),
-        'bias': dict(weight_decay=0.)
-    })
-```
-
-## Gradient clipping and gradient accumulation
-
-### Gradient clipping
-
-Besides the basic function of PyTorch optimizers, we also provide some enhancement functions, such as gradient clipping, gradient accumulation, etc. Please refer to [MMCV](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py) for more details.
-
-Currently we support `grad_clip` option in `optimizer_config`, and you can refer to [PyTorch Documentation](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html) for more arguments .
-
-Here is an example:
-
-```python
-optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
-# norm_type: type of the used p-norm, here norm_type is 2.
-```
-
-When inheriting from base and modifying configs, if `grad_clip=None` in base, `_delete_=True` is needed.
-
-### Gradient accumulation
-
-When there is not enough computation resource, the batch size can only be set to a small value, which may degrade the performance of model. Gradient accumulation can be used to solve this problem.
-
-Here is an example:
-
-```python
-data = dict(samples_per_gpu=64)
-optimizer_config = dict(type="DistOptimizerHook", update_interval=4)
-```
-
-Indicates that during training, back-propagation is performed every 4 iters. And the above is equivalent to:
-
-```python
-data = dict(samples_per_gpu=256)
-optimizer_config = dict(type="OptimizerHook")
-```
-
-## Customize self-implemented optimizer
-
-In academic research and industrial practice, it is likely that you need some optimization methods not implemented by MMSelfSup, and you can add them through the following methods.
-
-Implement your `CustomizedOptim` in `mmselfsup/core/optimizer/optimizers.py`
-
-```python
-import torch
-from torch.optim import *  # noqa: F401,F403
-from torch.optim.optimizer import Optimizer, required
-
-from mmcv.runner.optimizer.builder import OPTIMIZERS
-
-@OPTIMIZER.register_module()
-class CustomizedOptim(Optimizer):
-
-    def __init__(self, *args, **kwargs):
-
-        ## TODO
-
-    @torch.no_grad()
-    def step(self):
-
-        ## TODO
-```
-
-Import it in `mmselfsup/core/optimizer/__init__.py`
-
-```python
-from .optimizers import CustomizedOptim
-from .builder import build_optimizer
-
-__all__ = ['CustomizedOptim', 'build_optimizer', ...]
-```
-
-Use it in your config file
-
-```python
-optimizer = dict(
-    type='CustomizedOptim',
-    ...
-)
-```
--- a/docs/en/user_guides/1_config.md
+++ b/docs/en/user_guides/1_config.md
@ -1,8 +1,8 @@
-# Tutorial 0: Learn about Configs
+# Tutorial 1: Learn about Configs

 MMSelfSup mainly uses python files as configs. The design of our configuration file system integrates modularity and inheritance, facilitating users to conduct various experiments. All configuration files are placed in the `configs` folder. If you wish to inspect the config file in summary, you may run `python tools/misc/print_config.py` to see the complete config.

- [Tutorial 0: Learn about Configs](#tutorial-0-learn-about-configs)
+- [Tutorial 1: Learn about Configs](#tutorial-1-learn-about-configs)
  - [Config File and Checkpoint Naming Convention](#config-file-and-checkpoint-naming-convention)
    - [Algorithm information](#algorithm-information)
    - [Module information](#module-information)
--- a/docs/en/user_guides/2_dataset_prepare.md
+++ b/docs/en/user_guides/2_dataset_prepare.md
@ -1,15 +1,16 @@
-# Prepare Datasets
+# Tutorial 2: Prepare Datasets

 MMSelfSup supports multiple datasets. Please follow the corresponding guidelines for data preparation. It is recommended to symlink your dataset root to `$MMSELFSUP/data`. If your folder structure is different, you may need to change the corresponding paths in config files.

- [Prepare ImageNet](#prepare-imagenet)
- [Prepare Place205](#prepare-place205)
- [Prepare iNaturalist2018](#prepare-inaturalist2018)
- [Prepare PASCAL VOC](#prepare-pascal-voc)
- [Prepare CIFAR10](#prepare-cifar10)
- [Prepare datasets for detection and segmentation](#prepare-datasets-for-detection-and-segmentation)
-  - [Detection](#detection)
-  - [Segmentation](#segmentation)
+- [Tutorial 2: Prepare Datasets](#tutorial-2-prepare-datasets)
+  - [Prepare ImageNet](#prepare-imagenet)
+  - [Prepare Place205](#prepare-place205)
+  - [Prepare iNaturalist2018](#prepare-inaturalist2018)
+  - [Prepare PASCAL VOC](#prepare-pascal-voc)
+  - [Prepare CIFAR10](#prepare-cifar10)
+  - [Prepare datasets for detection and segmentation](#prepare-datasets-for-detection-and-segmentation)
+    - [Detection](#detection)
+    - [Segmentation](#segmentation)

 ```
 mmselfsup
--- a/docs/en/user_guides/3_pretrain.md
+++ b/docs/en/user_guides/3_pretrain.md
@ -0,0 +1,134 @@
+# Tutorial 3: Pretrain with existing models
+
+- [Tutorial 3: Pretrain with existing models](#tutorial-3-pretrain-with-existing-models)
+  - [Start to Train](#start-to-train)
+    - [Train with a single GPU](#train-with-a-single-gpu)
+    - [Train with CPU](#train-with-cpu)
+    - [Train with multiple GPUs](#train-with-multiple-gpus)
+    - [Train with multiple machines](#train-with-multiple-machines)
+    - [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
+
+This page provides basic usage about how to run algorithms and how to use some tools in MMSelfSup. For installation instructions and date preparation, please refer to [install.md](install.md) and [prepare_data.md](prepare_data.md).
+
+## Start to Train
+
+**Note**: The default learning rate in config files is for specific number of GPUs, which is indicated in the config names. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`.
+
+### Train with a single GPU
+
+```shell
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+
+A simple example to start training:
+
+```shell
+python tools/train.py configs/selfsup/mae/mae_vit-base-p16_8xb512-coslr-400e_in1k.py
+```
+
+### Train with CPU
+
+```shell
+export CUDA_VISIBLE_DEVICES=-1
+python tools/train.py ${CONFIG_FILE} [optional arguments]
+```
+
+**Note**: We do not recommend users to use CPU for training because it is too slow. We support this feature to allow users to debug on machines without GPU for convenience.
+
+### Train with multiple GPUs
+
+```shell
+sh tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]
+```
+
+Optional arguments:
+
+- `--work-dir`: Inidicate your custom work directory to save checkpoints and logs.
+- `--resume`: Automatically find the latest checkpoint in your work directory. Or set `--resume ${CHECKPOINT_PATH}` to load the specific checkpoint file.
+- `--amp`: Enable automatic-mixed-precision training.
+- `--cfg-options`: Setting `--cfg-options` will modify the original configs. For example, setting `--cfg-options randomness.seed=0` will set seed for random number.
+
+An example to start training with 8 GPUs:
+
+```shell
+sh tools/dist_train.sh configs/selfsup/mae/mae_vit-base-p16_8xb512-coslr-400e_in1k.py 8
+```
+
+Alternatively, if you run MMSelfSup on a cluster managed with **[slurm](https://slurm.schedmd.com/)**:
+
+```shell
+GPUS_PER_NODE=${GPUS_PER_NODE} GPUS=${GPUS} SRUN_ARGS=${SRUN_ARGS} sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} [optional arguments]
+```
+
+An example to start training with 8 GPUs:
+
+```shell
+# The default setting: GPUS_PER_NODE=8 GPUS=8
+sh tools/slurm_train.sh Dummy Test_job configs/selfsup/mae/mae_vit-base-p16_8xb512-coslr-400e_in1k.py
+```
+
+### Train with multiple machines
+
+If you launch with multiple machines simply connected with ethernet, you can simply run following commands:
+
+On the first machine:
+
+```shell
+NNODES=2 NODE_RANK=0 PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} sh tools/dist_train.sh ${CONFIG} ${GPUS}
+```
+
+On the second machine:
+
+```shell
+NNODES=2 NODE_RANK=1 PORT=${MASTER_PORT} MASTER_ADDR=${MASTER_ADDR} sh tools/dist_train.sh ${CONFIG} ${GPUS}
+```
+
+Usually it is slow if you do not have high speed networking like InfiniBand.
+
+If you launch with **slurm**, the command is the same as that on single machine described above, but you need refer to [slurm_train.sh](https://github.com/open-mmlab/mmselfsup/blob/master/tools/slurm_train.sh) to set appropriate parameters and environment variables.
+
+### Launch multiple jobs on a single machine
+
+If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.
+
+If you use `dist_train.sh` to launch training jobs:
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 sh tools/dist_train.sh ${CONFIG_FILE} 4 --work-dir tmp_work_dir_1
+
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 sh tools/dist_train.sh ${CONFIG_FILE} 4 --work-dir tmp_work_dir_2
+```
+
+If you use launch training jobs with slurm, you have two options to set different communication ports:
+
+Option 1:
+
+In `config1.py`:
+
+```python
+env_cfg = dict(dist_cfg=dict(backend='nccl', port=29500))
+```
+
+In `config2.py`:
+
+```python
+env_cfg = dict(dist_cfg=dict(backend='nccl', port=29501))
+```
+
+Then you can launch two jobs with config1.py and config2.py.
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py [optional arguments]
+
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py [optional arguments]
+```
+
+Option 2:
+
+You can set different communication ports without the need to modify the configuration file, but have to set the `--cfg-options` to overwrite the default port in configuration file.
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py --work-dir tmp_work_dir_1 --cfg-options env_cfg.dist_cfg.port=29500
+
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 sh tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py --work-dir tmp_work_dir_2 --cfg-options env_cfg.dist_cfg.port=29501
+```
--- a/docs/en/user_guides/analysis_tools.md
+++ b/docs/en/user_guides/analysis_tools.md
@ -0,0 +1 @@
+# Analysis tools
--- a/docs/en/user_guides/classification.md
+++ b/docs/en/user_guides/classification.md
@ -1,32 +1,16 @@
-# Tutorial 6: Run Benchmarks
+# Classification

 In MMSelfSup, we provide many benchmarks, thus the models can be evaluated on different downstream tasks. Here are comprehensive tutorials and examples to explain how to run all benchmarks with MMSelfSup.

- [Tutorial 6: Run Benchmarks](#tutorial-6-run-benchmarks)
-  - [Classification](#classification)
-    - [VOC SVM / Low-shot SVM](#voc-svm--low-shot-svm)
-    - [Linear Evaluation](#linear-evaluation)
-    - [ImageNet Semi-Supervised Classification](#imagenet-semi-supervised-classification)
-    - [ImageNet Nearest-Neighbor Classification](#imagenet-nearest-neighbor-classification)
-  - [Detection](#detection)
-  - [Segmentation](#segmentation)
-
-First, you are supposed to extract your backbone weights by `tools/model_converters/extract_backbone_weights.py`
-
-```shell
-python ./tools/model_converters/extract_backbone_weights.py {CHECKPOINT} {MODEL_FILE}
-```
-
-Arguments:
-
- `CHECKPOINT`: the checkpoint file of a selfsup method named as epoch\_\*.pth.
- `MODEL_FILE`: the output backbone weights file. If not mentioned, the `PRETRAIN` below uses this extracted model file.
-
-## Classification
+- [Classification](#classification)
+  - [VOC SVM / Low-shot SVM](#voc-svm--low-shot-svm)
+  - [Linear Evaluation](#linear-evaluation)
+  - [ImageNet Semi-Supervised Classification](#imagenet-semi-supervised-classification)
+  - [ImageNet Nearest-Neighbor Classification](#imagenet-nearest-neighbor-classification)

 As for classification, we provide scripts in folder `tools/benchmarks/classification/`, which has 4 `.sh` files, 1 folder for VOC SVM related classification task and 1 folder for ImageNet nearest-neighbor classification task.

-### VOC SVM / Low-shot SVM
+## VOC SVM / Low-shot SVM

 To run these benchmarks, you should first prepare your VOC datasets. Please refer to [prepare_data.md](https://github.com/open-mmlab/mmselfsup/blob/master/docs/en/prepare_data.md) for the details of data preparation.

@ -60,7 +44,7 @@ Remarks:
 - if you want to change GPU numbers, you could add `GPUS_PER_NODE=4 GPUS=4` at the beginning of the command.
 - `EPOCH` is the epoch number of the ckpt that you want to test

-### Linear Evaluation
+## Linear Evaluation

 The linear evaluation is one of the most general benchmarks, we integrate several papers' config settings, also including multi-head linear evaluation. We write classification model in our own codebase for the multi-head function, thus, to run linear evaluation, we still use `.sh` script to launch training. The supported datasets are **ImageNet**, **Places205** and **iNaturalist18**.

@ -78,7 +62,7 @@ Remarks:
 - `CONFIG`: Use config files under `configs/benchmarks/classification/`. Specifically, `imagenet` (excluding `imagenet_*percent` folders), `places205` and `inaturalist2018`.
 - `PRETRAIN`: the pre-trained model file.

-### ImageNet Semi-Supervised Classification
+## ImageNet Semi-Supervised Classification

 To run ImageNet semi-supervised classification, we still use `.sh` script to launch training.

@ -96,7 +80,7 @@ Remarks:
 - `CONFIG`: Use config files under `configs/benchmarks/classification/imagenet/`, named `imagenet_*percent` folders.
 - `PRETRAIN`: the pre-trained model file.

-### ImageNet Nearest-Neighbor Classification
+## ImageNet Nearest-Neighbor Classification

 To evaluate the pre-trained models using the nearest-neighbor benchmark, you can run command below.

@ -126,67 +110,3 @@ Remarks:
 - `PRETRAIN`: the pre-trained model file.
 - if you want to change GPU numbers, you could add `GPUS_PER_NODE=4 GPUS=4` at the beginning of the command.
 - `EPOCH` is the epoch number of the ckpt that you want to test
-
-## Detection
-
-Here, we prefer to use MMDetection to do the detection task. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
-
-```shell
-pip install openmim
-```
-
-It is very easy to install the package.
-
-Besides, please refer to MMDet for [installation](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/get_started.md) and [data preparation](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/1_exist_data_model.md)
-
-After installation, you can run MMDet with simple command.
-
-```shell
-# distributed version
-bash tools/benchmarks/mmdetection/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
-
-# slurm version
-bash tools/benchmarks/mmdetection/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
-```
-
-Remarks:
-
- `CONFIG`: Use config files under `configs/benchmarks/mmdetection/` or write your own config files
- `PRETRAIN`: the pre-trained model file.
-
-Or if you want to do detection task with [detectron2](https://github.com/facebookresearch/detectron2), we also provides some config files.
-Please refer to [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/main/INSTALL.md) for installation and follow the [directory structure](https://github.com/facebookresearch/detectron2/tree/main/datasets) to prepare your datasets required by detectron2.
-
-```shell
-conda activate detectron2 # use detectron2 environment here, otherwise use open-mmlab environment
-cd benchmarks/detection
-python convert-pretrain-to-detectron2.py ${WEIGHT_FILE} ${OUTPUT_FILE} # must use .pkl as the output extension.
-bash run.sh ${DET_CFG} ${OUTPUT_FILE}
-```
-
-## Segmentation
-
-For semantic segmentation task, we use MMSegmentation. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
-
-```shell
-pip install openmim
-```
-
-It is very easy to install the package.
-
-Besides, please refer to MMSeg for [installation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/get_started.md) and [data preparation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/dataset_prepare.md#prepare-datasets).
-
-After installation, you can run MMSeg with simple command.
-
-```shell
-# distributed version
-bash tools/benchmarks/mmsegmentation/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
-
-# slurm version
-bash tools/benchmarks/mmsegmentation/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
-```
-
-Remarks:
-
- `CONFIG`: Use config files under `configs/benchmarks/mmsegmentation/` or write your own config files
- `PRETRAIN`: the pre-trained model file.
--- a/docs/en/user_guides/detection.md
+++ b/docs/en/user_guides/detection.md
@ -0,0 +1,41 @@
+# Detection
+
+- [Detection](#detection)
+  - [Train](#train)
+
+Here, we prefer to use MMDetection to do the detection task. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
+
+```shell
+pip install openmim
+```
+
+It is very easy to install the package.
+
+Besides, please refer to MMDet for [installation](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/get_started.md) and [data preparation](https://github.com/open-mmlab/mmdetection/blob/master/docs/en/1_exist_data_model.md)
+
+## Train
+
+After installation, you can run MMDet with simple command.
+
+```shell
+# distributed version
+bash tools/benchmarks/mmdetection/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
+
+# slurm version
+bash tools/benchmarks/mmdetection/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
+```
+
+Remarks:
+
+- `CONFIG`: Use config files under `configs/benchmarks/mmdetection/` or write your own config files
+- `PRETRAIN`: the pre-trained model file.
+
+Or if you want to do detection task with [detectron2](https://github.com/facebookresearch/detectron2), we also provides some config files.
+Please refer to [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/main/INSTALL.md) for installation and follow the [directory structure](https://github.com/facebookresearch/detectron2/tree/main/datasets) to prepare your datasets required by detectron2.
+
+```shell
+conda activate detectron2 # use detectron2 environment here, otherwise use open-mmlab environment
+cd benchmarks/detection
+python convert-pretrain-to-detectron2.py ${WEIGHT_FILE} ${OUTPUT_FILE} # must use .pkl as the output extension.
+bash run.sh ${DET_CFG} ${OUTPUT_FILE}
+```
--- a/docs/en/user_guides/index.rst
+++ b/docs/en/user_guides/index.rst
@ -0,0 +1,28 @@
+Pretrain
+**************
+
+.. toctree::
+   :maxdepth: 1
+
+   1_config.md
+   2_dataset_prepare.md
+   3_pretrain.md
+
+Downstream Tasks
+**************
+
+.. toctree::
+   :maxdepth: 1
+
+   classification.md
+   detection.md
+   segmentation.md
+
+Useful Tools
+************
+
+.. toctree::
+   :maxdepth: 1
+
+   visualization.md
+   analysis_tools.md
--- a/docs/en/user_guides/segmentation.md
+++ b/docs/en/user_guides/segmentation.md
@ -0,0 +1,31 @@
+# Segmentation
+
+- [Segmentation](#segmentation)
+  - [Train](#train)
+
+For semantic segmentation task, we use MMSegmentation. First, make sure you have installed [MIM](https://github.com/open-mmlab/mim), which is also a project of OpenMMLab.
+
+```shell
+pip install openmim
+```
+
+It is very easy to install the package.
+
+Besides, please refer to MMSeg for [installation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/get_started.md) and [data preparation](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/dataset_prepare.md#prepare-datasets).
+
+## Train
+
+After installation, you can run MMSeg with simple command.
+
+```shell
+# distributed version
+bash tools/benchmarks/mmsegmentation/mim_dist_train.sh ${CONFIG} ${PRETRAIN} ${GPUS}
+
+# slurm version
+bash tools/benchmarks/mmsegmentation/mim_slurm_train.sh ${PARTITION} ${CONFIG} ${PRETRAIN}
+```
+
+Remarks:
+
+- `CONFIG`: Use config files under `configs/benchmarks/mmsegmentation/` or write your own config files
+- `PRETRAIN`: the pre-trained model file.
--- a/docs/en/user_guides/visualization.md
+++ b/docs/en/user_guides/visualization.md
@ -0,0 +1 @@
+# Visualization