diff --git a/README.md b/README.md
index 3c2133d7..0c5016da 100644
--- a/README.md
+++ b/README.md
@@ -81,7 +81,7 @@ More algorithms are in our plan.
 
 ## Installation
 
-Please refer to [install.md](docs/install.md) for installation and [data_prepare.md](docs/data_prepare.md) for dataset preparation.
+Please refer to [install.md](docs/install.md) for installation and [prepare_data.md](docs/prepare_data.md) for dataset preparation.
 
 ## Get Started
 
diff --git a/README_zh-CN.md b/README_zh-CN.md
index 6dd620e5..ffbb1838 100644
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -71,7 +71,7 @@
 
 ## 安装
 
-请参考 [安装文档](docs_zh-CN/install.md) 进行安装和参考 [数据准备](docs_zh-CN/data_prepare.md) 准备数据集。
+请参考 [安装文档](docs_zh-CN/install.md) 进行安装和参考 [准备数据](docs_zh-CN/prepare_data.md) 准备数据集。
 
 ## 快速入门
 
diff --git a/configs/selfsup/byol/README.md b/configs/selfsup/byol/README.md
index 32fca192..ea80a3e4 100644
--- a/configs/selfsup/byol/README.md
+++ b/configs/selfsup/byol/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                              | mIOU  |
 | ----------------------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/deepcluster/README.md b/configs/selfsup/deepcluster/README.md
index ff2f9eb1..c22a8762 100644
--- a/configs/selfsup/deepcluster/README.md
+++ b/configs/selfsup/deepcluster/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                                   | mIOU  |
 | ---------------------------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/densecl/README.md b/configs/selfsup/densecl/README.md
index 634eb0c0..d5b781eb 100644
--- a/configs/selfsup/densecl/README.md
+++ b/configs/selfsup/densecl/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                 | mIOU  |
 | ---------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/moco/README.md b/configs/selfsup/moco/README.md
index 96db3a63..c22e9fdc 100644
--- a/configs/selfsup/moco/README.md
+++ b/configs/selfsup/moco/README.md
@@ -104,7 +104,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                       | mIOU  |
 | ---------------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/npid/README.md b/configs/selfsup/npid/README.md
index a2745169..5aeecd92 100644
--- a/configs/selfsup/npid/README.md
+++ b/configs/selfsup/npid/README.md
@@ -87,7 +87,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                                      | mIOU  |
 | ------------------------------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/odc/README.md b/configs/selfsup/odc/README.md
index e02a2bc4..e33e234c 100644
--- a/configs/selfsup/odc/README.md
+++ b/configs/selfsup/odc/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                               | mIOU  |
 | -------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/relative_loc/README.md b/configs/selfsup/relative_loc/README.md
index de85df96..25907988 100644
--- a/configs/selfsup/relative_loc/README.md
+++ b/configs/selfsup/relative_loc/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                      | mIOU  |
 | --------------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/rotation_pred/README.md b/configs/selfsup/rotation_pred/README.md
index 69389090..0c58960f 100644
--- a/configs/selfsup/rotation_pred/README.md
+++ b/configs/selfsup/rotation_pred/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                       | mIOU  |
 | ---------------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/simclr/README.md b/configs/selfsup/simclr/README.md
index 3df79d16..086cb1b6 100644
--- a/configs/selfsup/simclr/README.md
+++ b/configs/selfsup/simclr/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                | mIOU  |
 | --------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/simsiam/README.md b/configs/selfsup/simsiam/README.md
index 87c55b3c..ee2d1ca0 100644
--- a/configs/selfsup/simsiam/README.md
+++ b/configs/selfsup/simsiam/README.md
@@ -85,7 +85,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                 | mIOU  |
 | ---------------------------------------------------------------------- | ----- |
diff --git a/configs/selfsup/swav/README.md b/configs/selfsup/swav/README.md
index cf76dc2b..40683059 100644
--- a/configs/selfsup/swav/README.md
+++ b/configs/selfsup/swav/README.md
@@ -80,7 +80,7 @@ The segmentation benchmarks includes 2 downstream task datasets, **Cityscapes**
 
 #### Pascal VOC 2012 + Aug
 
-Please refer to [config](configs/benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
+Please refer to [fcn_r50-d8_512x512_20k_voc12aug.py](../../benchmarks/mmsegmentation/voc12aug/fcn_r50-d8_512x512_20k_voc12aug.py) for details of config.
 
 | Self-Supervised Config                                                                                     | mIOU  |
 | ---------------------------------------------------------------------------------------------------------- | ----- |
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
deleted file mode 100644
index b3ff4668..00000000
--- a/docs/CHANGELOG.md
+++ /dev/null
@@ -1,37 +0,0 @@
-## Changelog
-
-### v0.3.0 (14/10/2020)
-
-#### Highlight
-* Support Mixed Precision Training
-* Improvement of GaussianBlur doubles the training speed
-* More benchmarking results
-
-#### Bug Fixes
-* Fix bugs in moco v2, now the results are reproducible.
-* Fix bugs in byol.
-
-#### New Features
-* Mixed Precision Training
-* Improvement of GaussianBlur doubles the training speed of MoCo V2, SimCLR, BYOL
-* More benchmarking results, including Places, VOC, COCO
-
-### v0.2.0 (26/6/2020)
-
-#### Highlights
-* Support BYOL
-* Support semi-supervised benchmarks
-
-#### Bug Fixes
-* Fix hash id in publish_model.py
-
-#### New Features
-
-* Support BYOL.
-* Separate train and test scripts in linear/semi evaluation.
-* Support semi-supevised benchmarks: benchmarks/dist_train_semi.sh.
-* Move benchmarks related configs into configs/benchmarks/.
-* Provide benchmarking results and model download links.
-* Support updating network every several interations.
-* Support LARS optimizer with nesterov.
-* Support excluding specific parameters from LARS adaptation and weight decay required in SimCLR and BYOL.
diff --git a/docs/GETTING_STARTED.md b/docs/GETTING_STARTED.md
deleted file mode 100644
index bb3a56fe..00000000
--- a/docs/GETTING_STARTED.md
+++ /dev/null
@@ -1,287 +0,0 @@
-# Getting Started
-
-This page provides basic tutorials about the usage of OpenSelfSup.
-For installation instructions, please see [INSTALL.md](INSTALL.md).
-
-## Train existing methods
-
-**Note**: The default learning rate in config files is for 8 GPUs. If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
-
-### Train with single/multiple GPUs
-
-```shell
-bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]
-```
-Optional arguments are:
-- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
-- `--pretrained ${PRETRAIN_WEIGHTS}`: Load pretrained weights for the backbone.
-- `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
-
-An example:
-```shell
-# checkpoints and logs saved in WORK_DIR=work_dirs/selfsup/odc/r50_v1/
-bash tools/dist_train.sh configs/selfsup/odc/r50_v1.py 8
-```
-**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
-
-```shell
-ln -s /DATA/xhzhan/openselfsup_workdirs ${OPENSELFSUP}/work_dirs
-```
-
-Alternatively, if you run OpenSelfSup on a cluster managed with [slurm](https://slurm.schedmd.com/):
-```shell
-SRUN_ARGS="${SRUN_ARGS}" bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} ${GPUS} [optional arguments]
-```
-
-An example:
-```shell
-SRUN_ARGS="-w xx.xx.xx.xx" bash tools/srun_train.sh Dummy configs/selfsup/odc/r50_v1.py 8 --resume_from work_dirs/selfsup/odc/r50_v1/epoch_100.pth
-```
-
-### Train with multiple machines
-
-If you launch with multiple machines simply connected with ethernet, you have to modify `tools/dist_train.sh` or create a new script, please refer to PyTorch [Launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility). Usually it is slow if you do not have high speed networking like InfiniBand.
-
-If you launch with slurm, the command is the same as that on single machine described above. You only need to change ${GPUS}, e.g., to 16 for two 8-GPU machines.
-
-### Launch multiple jobs on a single machine
-
-If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
-you need to specify different ports (29500 by default) for each job to avoid communication conflict.
-
-If you use `dist_train.sh` to launch training jobs:
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh ${CONFIG_FILE} 4
-CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash tools/dist_train.sh ${CONFIG_FILE} 4
-```
-
-If you use launch training jobs with slurm:
-```shell
-GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29500
-GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29501
-```
-
-### What if I do not have so many GPUs?
-
-Assuming that you only have 1 GPU that can contain 64 images in a batch, while you expect the batch size to be 256, you may add the following line into your config file. It performs network update every 4 iterations. In this way, the equivalent batch size is 256. Of course, it is about 4x slower than using 4 GPUs. Note that the workaround is not applicable for methods like SimCLR which require intra-batch communication.
-
-```python
-optimizer_config = dict(update_interval=4)
-```
-
-### Mixed Precision Training (Optional)
-We use [Apex](https://github.com/NVIDIA/apex) to implement Mixed Precision Training. 
-If you want to use Mixed Precision Training, you can add below in the config file.
-```python
-use_fp16 = True
-optimizer_config = dict(use_fp16=use_fp16)
-```
-An example:
-```python
-bash tools/dist_train.sh configs/selfsup/moco/r50_v1_fp16.py 8
-```
-
-### Speeding Up IO (Optional)
-1 . Prefetching data helps to speeding up IO and make better use of CUDA stream parallelization. 
-If you want to use it, you can activate it in the config file (disabled by default).
-```python
-prefetch = True
-```
-2 . Costly operation ToTensor is reimplemented along with prefetch.
-
-3 . Replacing  Pillow with Pillow-SIMD (https://github.com/uploadcare/pillow-simd.git) to make use of SIMD command sets with modern CPU.
- ```shell
-pip uninstall pillow
-pip install Pillow-SIMD or CC="cc -mavx2" pip install -U --force-reinstall pillow-simd if AVX2 is available.
-```
-We test it using MoCoV2 using a total batch size of 256 on Tesla V100. The training time per step is decreased to 0.17s from 0.23s.
-## Benchmarks
-
-We provide several standard benchmarks to evaluate representation learning. The config files or scripts for evaluation mentioned below are NOT recommended to be changed if you want to use this repo in your publications. We hope that all methods are under a fair comparison.
-
-### VOC07 Linear SVM & Low-shot Linear SVM
-
-```shell
-# test by epoch (only applicable to experiments trained with OpenSelfSup)
-bash benchmarks/dist_test_svm_epoch.sh ${CONFIG_FILE} ${EPOCH} ${FEAT_LIST} ${GPUS}
-# test a pretrained model (applicable to any pre-trained models)
-bash benchmarks/dist_test_svm_pretrain.sh ${CONFIG_FILE} ${PRETRAIN} ${FEAT_LIST} ${GPUS}
-```
-Augments:
-- `${CONFIG_FILE}` the config file of the self-supervised experiment.
-- `${FEAT_LIST}` is a string to specify features from layer1 to layer5 to evaluate; e.g., if you want to evaluate layer5 only, then `FEAT_LIST` is `"feat5"`, if you want to evaluate all features, then then `FEAT_LIST` is `"feat1 feat2 feat3 feat4 feat5"` (separated by space). If left empty, the default `FEAT_LIST` is `"feat5"`.
-- `$GPUS` is the number of GPUs to extract features.
-
-Working directories:
-The features, logs and intermediate files generated are saved in `$SVM_WORK_DIR/` as follows:
-- `dist_test_svm_epoch.sh`: `SVM_WORK_DIR=$WORK_DIR/` (The same as that mentioned in `Train with single/multiple GPUs` above.) Hence, the files will be overridden to save space when evaluating with a new `$EPOCH`.
-- `dist_test_svm_pretrain.sh`: `SVM_WORK_DIR=$WORK_DIR/$PRETRAIN_NAME/`, e.g., if `PRETRAIN=pretrains/odc_r50_v1-5af5dd0c.pth`, then `PRETRAIN_NAME=odc_r50_v1-5af5dd0c.pth`; if `PRETRAIN=random`, then `PRETRAIN_NAME=random`.
-
-Notes:
-- The evaluation records are saved in `$SVM_WORK_DIR/logs/eval_svm.log`.
-- When using `benchmarks/dist_test_svm_epoch.sh`, DO NOT launch multiple tests of the same experiment with different epochs, since they share the same working directory.
-- Linear SVM takes 5 min, low-shot linear SVM takes about 1 hour with 32 CPU cores. If you want to save time, you may delete or comment the low-shot SVM testing command (the last line in the scripts).
-
-### ImageNet / Places205 Linear Classification
-
-**First**, extract backbone weights:
-```shell
-python tools/extract_backbone_weights.py ${CHECKPOINT} ${WEIGHT_FILE}
-```
-Arguments:
-- `CHECKPOINTS`: the checkpoint file of a selfsup method named as `epoch_*.pth`.
-- `WEIGHT_FILE`: the output backbone weights file, e.g., `pretrains/moco_r50_v1-4ad89b5c.pth`.
-
-**Next**, train and test linear classification:
-```shell
-# train
-bash benchmarks/dist_train_linear.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
-# test (unnecessary if have validation in training)
-bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${CHECKPOINT}
-```
-Augments:
-- `CONFIG_FILE`: Use config files under "configs/benchmarks/linear_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/benchmarks/linear_classification/imagenet/r50_multihead_sobel.py`.
-- Optional arguments include:
-    - `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
-    - `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
-
-Working directories:
-Where are the checkpoints and logs? E.g., if you use `configs/benchmarks/linear_classification/imagenet/r50_multihead.py` to evaluate `pretrains/moco_r50_v1-4ad89b5c.pth`, then the working directories for this evalution is `work_dirs/benchmarks/linear_classification/imagenet/r50_multihead/moco_r50_v1-4ad89b5c.pth/`.
-
-### ImageNet Semi-Supervised Classification
-
-```shell
-# train
-bash benchmarks/dist_train_semi.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
-# test (unnecessary if have validation in training)
-bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${CHECKPOINT}
-```
-Augments:
-- `CONFIG_FILE`: Use config files under "configs/benchmarks/semi_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/benchmarks/semi_classification/imagenet_1percent/r50_sobel.py`.
-- Optional arguments include:
-    - `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
-    - `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
-
-### VOC07+12 / COCO17 Object Detection
-
-For more details to setup the environments for detection, please refer [here](https://github.com/open-mmlab/OpenSelfSup/blob/master/benchmarks/detection/README.md).
-
-```shell
-conda activate detectron2 # use detectron2 environment here, otherwise use open-mmlab environment
-cd benchmarks/detection
-python convert-pretrain-to-detectron2.py ${WEIGHT_FILE} ${OUTPUT_FILE} # must use .pkl as the output extension.
-bash run.sh ${DET_CFG} ${OUTPUT_FILE}
-```
-Arguments:
-- `WEIGHT_FILE`: The extracted backbone weights extracted aforementioned.
-- `OUTPUT_FILE`: Converted backbone weights file, e.g., `odc_v1.pkl`.
-- `DET_CFG`: The detectron2 config file, usually we use `configs/pascal_voc_R_50_C4_24k_moco.yaml`.
-
-**Note**:
-- This benchmark must use 8 GPUs as the default setting from MoCo.
-- Please report the mean of 5 trials in your offical paper, according to MoCo.
-- DeepCluster that uses Sobel layer is not supported by detectron2.
-
-## Tools and Tips
-
-### Count number of parameters
-
-```shell
-python tools/count_parameters.py ${CONFIG_FILE}
-```
-
-### Publish a model
-
-Compute the hash of the weight file and append the hash id to the filename. The output file is the input file name with a hash suffix.
-
-```shell
-python tools/publish_model.py ${WEIGHT_FILE}
-```
-Arguments:
-- `WEIGHT_FILE`: The extracted backbone weights extracted aforementioned.
-
-### Reproducibility
-
-If you want to make your performance exactly reproducible, please switch on `--deterministic` to train the final model to be published. Note that this flag will switch off `torch.backends.cudnn.benchmark` and slow down the training speed.
-
-## How-to
-
-### Use a new dataset
-
-1. Write a data source file under `openselfsup/datasets/data_sources/`. You may refer to the existing ones.
-
-2. Create new config files for your experiments.
-
-### Design your own methods
-
-#### What you need to do
-
-    1. Create a dataset file under `openselfsup/datasets/` (better using existing ones);
-    2. Create a model file under `openselfsup/models/`. The model typically contains:
-      i) backbone (required): images to deep features from differet depth of layers. Your model must contain a `self.backbone` module, otherwise the backbone weights cannot be extracted.
-      ii) neck (optional): deep features to compact feature vectors.
-      iii) head (optional): define loss functions.
-      iv) memory_bank (optional): define memory banks.
-    3. Create a config file under `configs/` and setup the configs;
-    4. [Optional] Create a hook file under `openselfsup/hooks/` if your method requires additional operations before run, every several iterations, every several epoch, or after run.
-    
-You may refer to existing modules under respective folders.
-
-#### Features that may facilitate your implementation
-
-* Decoupled data source and dataset.
-
-Since dataset is correlated to a specific task while data source is general, we decouple data source and dataset in OpenSelfSup.
-
-```python
-data = dict(
-    train=dict(type='ContrastiveDataset',
-               data_source=dict(type='ImageNet', list_file='xx', root='xx'),
-               pipeline=train_pipeline),
-    val=dict(...),
-    ...
-)
-```
-
-* Configure data augmentations in the config file.
-
-The augmentations are the same as `torchvision.transforms` except that `torchvision.transforms.RandomAppy` corresponds to `RandomAppliedTrans`. `Lighting` and `GaussianBlur` is additionally implemented.
-
-```python
-img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
-train_pipeline = [
-    dict(type='RandomResizedCrop', size=224),
-    dict(type='RandomAppliedTrans',
-        transforms=[
-            dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, kernel_size=23)],
-        p=0.5),
-    dict(type='ToTensor'),
-    dict(type='Normalize', **img_norm_cfg)
-]
-```
-
-* Parameter-wise optimization parameters.
-
-You may specify optimization paramters including lr, momentum and weight_decay for a certain group of paramters in the config file with `paramwise_options`. `paramwise_options` is a dict whose key is regular expressions and value is options. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult, lars_exclude (only works with LARS optimizer).
-
-```python
-# this config sets all normalization layers with weight_decay_mult=0.1,
-# and the head with `lr_mult=10, momentum=0`.
-paramwise_options = {
-    '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1),
-    '\Ahead.': dict(lr_mult=10, momentum=0)}
-optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
-                     weight_decay=0.0001,
-                     paramwise_options=paramwise_options)
-```
-
-* Configure custom hooks in the config file.
-
-The hooks will be called in order. For hook design, please refer to [odc_hook.py](https://github.com/open-mmlab/OpenSelfSup/blob/master/openselfsup/hooks/odc_hook.py) as an example.
-
-```python
-custom_hooks = [
-    dict(type='DeepClusterHook', ...),
-    dict(type='ODCHook', ...),
-]
-```
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
deleted file mode 100644
index 7f29c601..00000000
--- a/docs/INSTALL.md
+++ /dev/null
@@ -1,157 +0,0 @@
-## Installation
-
-### Requirements
-
-- Linux (Windows is not officially supported)
-- Python 3.5+
-- PyTorch 1.1 or higher
-- CUDA 9.0 or higher
-- NCCL 2
-- GCC 4.9 or higher
-- [mmcv](https://github.com/open-mmlab/mmcv)
-
-We have tested the following versions of OS and softwares:
-
-- OS: Ubuntu 16.04/18.04 and CentOS 7.2
-- CUDA: 9.0/9.2/10.0/10.1
-- NCCL: 2.1.15/2.2.13/2.3.7/2.4.2 (PyTorch-1.1 w/ NCCL-2.4.2 has a deadlock bug, see [here](https://github.com/open-mmlab/OpenSelfSup/issues/6))
-- GCC(G++): 4.9/5.3/5.4/7.3
-
-### Install openselfsup
-
-a. Create a conda virtual environment and activate it.
-
-```shell
-conda create -n open-mmlab python=3.7 -y
-conda activate open-mmlab
-```
-
-b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/), e.g.,
-
-```shell
-conda install pytorch torchvision -c pytorch
-```
-
-c. Install other third-party libraries.
-
-```shell
-conda install faiss-gpu cudatoolkit=10.0 -c pytorch # optional for DeepCluster and ODC, assuming CUDA=10.0
-```
-
-d. Clone the openselfsup repository.
-
-```shell
-git clone https://github.com/open-mmlab/openselfsup.git
-cd openselfsup
-```
-
-e. Install.
-
-```shell
-pip install -v -e .  # or "python setup.py develop"
-```
-
-f. Install Apex (optional), following the [official instructions](https://github.com/NVIDIA/apex), e.g.
-```shell
-git clone https://github.com/NVIDIA/apex
-cd apex
-pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
-```
-
-Note:
-
-1. The git commit id will be written to the version number with step d, e.g. 0.6.0+2e7045c. The version will also be saved in trained models.
-
-2. Following the above instructions, openselfsup is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it (unless you submit some commits and want to update the version number).
-
-3. If you would like to use `opencv-python-headless` instead of `opencv-python`,
-you can install it before installing MMCV.
-
-
-### Prepare datasets
-
-It is recommended to symlink your dataset root (assuming $YOUR_DATA_ROOT) to `$OPENSELFSUP/data`.
-If your folder structure is different, you may need to change the corresponding paths in config files.
-
-#### Prepare PASCAL VOC
-
-Assuming that you usually store datasets in `$YOUR_DATA_ROOT` (e.g., for me, `/home/xhzhan/data/`).
-This script will automatically download PASCAL VOC 2007 into `$YOUR_DATA_ROOT`, prepare the required files, create a folder `data` under `$OPENSELFSUP` and make a symlink `VOCdevkit`.
-
-```shell
-cd $OPENSELFSUP
-bash tools/prepare_data/prepare_voc07_cls.sh $YOUR_DATA_ROOT
-```
-
-#### Prepare ImageNet and Places205
-
-Taking ImageNet for example, you need to 1) download ImageNet; 2) create the following list files or download [here](https://drive.google.com/drive/folders/1wYkJU_1qRHEt1LPVjBiG6ddUFV-t9hVJ?usp=sharing) under $IMAGENET/meta/: `train.txt` and `val.txt` contains an image file name in each line, `train_labeled.txt` and `val_labeled.txt` contains `filename[space]label\n` in each line; `train_labeled_*percent.txt` are the down-sampled lists for semi-supervised evaluation. 3) create a symlink under `$OPENSELFSUP/data/`.
-
-At last, the folder looks like:
-
-```
-OpenSelfSup
-├── openselfsup
-├── benchmarks
-├── configs
-├── data
-│   ├── VOCdevkit
-│   │   ├── VOC2007
-│   │   ├── VOC2012
-│   ├── imagenet
-│   │   ├── meta
-│   │   |   ├── train.txt (for self-sup training, "filename\n" in each line)
-│   │   |   ├── train_labeled.txt (for linear evaluation, "filename[space]label\n" in each line)
-│   │   |   ├── train_labeled_1percent.txt (for semi-supervised evaluation)
-│   │   |   ├── train_labeled_10percent.txt (for semi-supervised evaluation)
-│   │   |   ├── val.txt
-│   │   |   ├── val_labeled.txt (for evaluation)
-│   │   ├── train
-│   │   ├── val
-│   ├── places205
-│   │   ├── meta
-│   │   |   ├── train.txt
-│   │   |   ├── train_labeled.txt
-│   │   |   ├── val.txt
-│   │   |   ├── val_labeled.txt
-│   │   ├── train
-│   │   ├── val
-```
-
-### A from-scratch setup script
-
-Here is a full script for setting up openselfsup with conda and link the dataset path. The script does not download ImageNet and Places datasets, you have to prepare them on your own.
-
-```shell
-conda create -n open-mmlab python=3.7 -y
-conda activate open-mmlab
-
-conda install -c pytorch pytorch torchvision -y
-git clone https://github.com/open-mmlab/OpenSelfSup.git
-cd OpenSelfSup
-pip install -v -e .
-
-bash tools/prepare_data/prepare_voc07_cls.sh $YOUR_DATA_ROOT
-ln -s $IMAGENET_ROOT data/imagenet
-ln -s $PLACES_ROOT data/places205
-```
-
-### Using multiple OpenSelfSup versions
-
-If there are more than one openselfsup on your machine, and you want to use them alternatively, the recommended way is to create multiple conda environments and use different environments for different versions.
-
-Another way is to insert the following code to the main scripts (`train.py`, `test.py` or any other scripts you run)
-```python
-import os.path as osp
-import sys
-sys.path.insert(0, osp.join(osp.dirname(osp.abspath(__file__)), '../'))
-```
-
-Or run the following command in the terminal of corresponding folder to temporally use the current one.
-```shell
-export PYTHONPATH=`pwd`:$PYTHONPATH
-```
-
-## Common Issues
-
-1. The training hangs / deadlocks in some intermediate iteration. See this [issue](https://github.com/open-mmlab/OpenSelfSup/issues/6).
diff --git a/docs/MODEL_ZOO.md b/docs/MODEL_ZOO.md
deleted file mode 100644
index 882cb3ff..00000000
--- a/docs/MODEL_ZOO.md
+++ /dev/null
@@ -1,184 +0,0 @@
-# Model Zoo
-
-**OpenSelfSup needs your contribution!
-Since we don't have sufficient GPUs to run these large-scale experiments, your contributions, including parameter studies, reproducing of results, implementing new methods, etc, are essential to make OpenSelfSup better. Your contribution will be recorded in the below table, top contributors will be included in the author list of OpenSelfSup!**
-
-## Pre-trained model download links and speed test.
-**Note**
-* If not specifically indicated, the testing GPUs are NVIDIA Tesla V100.
-* The table records the implementors who implemented the methods (either by themselves or refactoring from other repos), and the experimenters who performed experiments and reproduced the results. The experimenters should be responsible for the evaluation results on all the benchmarks, and the implementors should be responsible for the implementation as well as the results; If the experimenter is not indicated, an implementator is the experimenter by default.
-
-<table><thead><tr><th>Method (Implementator)</th><th>Config (Experimenter)</th><th>Remarks</th><th>Download link</th><th>Batch size</th><th>Epochs</th></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py">ImageNet</a></td><td>-</td><td>torchvision</td><td><a href="https://drive.google.com/file/d/11xA3TOcbD0qOrwpBfYonEDeseE1wMfBh/view?usp=sharing">imagenet_r50-21352794.pth</a></td><td>-</td><td>-</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td><a href="https://drive.google.com/file/d/1UaFTjd6sbKkZEE-f58Zv30bnx7C1qJBb/view?usp=sharing">random_r50-5d0fa71b.pth</a></td><td>-</td><td>-</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf">Relative-Loc</a> (<a href="https://github.com/Jiahao000">@Jiahao000</a>)</td><td>selfsup/relative_loc/r50.py</td><td>default</td><td><a href="https://drive.google.com/file/d/1ibk1BI3PFQxZqcxuDfHs3n7JnWKCgl8x/view?usp=sharing">relative_loc_r50-342c9097.pth</a></td><td>512</td><td>70</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728">Rotation-Pred</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td><a href="https://drive.google.com/file/d/1t3oClmIvQ0p8RZ0V5yvQFltzjqBO823Y/view?usp=sharing">rotation_r50-cfab8ebb.pth</a></td><td>128</td><td>70</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1807.05520">DeepCluster</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/deepcluster/r50.py</td><td>default</td><td><a href="https://drive.google.com/file/d/1GxgP7pI18JtFxDIC0hnHOanvUYajoLlg/view?usp=sharing">deepcluster_r50-bb8681e2.pth</a></td><td>512</td><td>200</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978">NPID</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/npid/r50.py</td><td>default</td><td><a href="https://drive.google.com/file/d/1sm6I3Y5XnCWdbmeLSF4YupUtPe5nRQMI/view?usp=sharing">npid_r50-dec3df0c.pth</a></td><td>256</td><td>200</td></tr>
-<tr><td></td><td>selfsup/npid/r50_ensure_neg.py</td><td>ensure_neg=True</td><td><a href="https://drive.google.com/file/d/1FldDrb6kzF3CZ7737mwCXVI6HE2aCSaF/view?usp=sharing">npid_r50_ensure_neg-ce09b7ae.pth</a></td><td></td><td></td></tr>
-<tr><td><a href="http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhan_Online_Deep_Clustering_for_Unsupervised_Representation_Learning_CVPR_2020_paper.pdf">ODC</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/odc/r50_v1.py (<a href="https://github.com/Jiahao000">@Jiahao000</a>)</td><td>default</td><td><a href="https://drive.google.com/file/d/1EdhJeZAyMsD_pEW7uMhLzos5xZLdariN/view?usp=sharing">odc_r50_v1-5af5dd0c.pth</a></td><td>512</td><td>440</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722">MoCo</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/moco/r50_v1.py</td><td>default</td><td><a href="https://drive.google.com/file/d/1ANXfnoT8yBQQBBqR_kQLQorK20l65KMy/view?usp=sharing">moco_r50_v1-4ad89b5c.pth</a></td><td>256</td><td>200</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297">MoCo v2</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/moco/r50_v2.py</td><td>default</td><td><a href="https://drive.google.com/file/d/1ImO8A3uWbrTx21D1IqBDMUQvpN6wmv0d/view?usp=sharing">moco_r50_v2-e3b0c442.pth</a></td><td>256</td><td>200</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709">SimCLR</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td><a href="https://drive.google.com/file/d/1aZ43nSdivdNxHbM9DKVoZYVhZ8TNnmPp/view?usp=sharing">simclr_r50_bs256_ep200-4577e9a6.pth</a></td><td>256</td><td>200</td></tr>
-<tr><td></td><td>selfsup/simclr/r50_bs256_ep200_mocov2_neck.py</td><td>-&gt; MoCo v2 neck</td><td><a href="https://drive.google.com/file/d/1AXpSKqgWfnj6jCgN65BXSTCKFfuIVELa/view?usp=sharing">simclr_r50_bs256_ep200_mocov2_neck-0d6e5ff2.pth</a></td><td></td><td></td></tr>
-<!--
-<tr><td><a href="https://arxiv.org/abs/2006.07733">BYOL</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/byol/r50_bs4096_ep200.py (<a href="https://github.com/xieenze">@xieenze</a>)</td><td>default</td><td><a href="https://drive.google.com/file/d/1Whj3j5E3ShQj_VufjrJSzWiq1xcZZCXN/view?usp=sharing">byol_r50-e3b0c442.pth</a></td><td>4096</td><td>200</td></tr>
--->
-<tr><td><a href="https://arxiv.org/abs/2006.07733">BYOL</a> (<a href="https://github.com/XiaohangZhan">@XiaohangZhan</a>)</td><td>selfsup/byol/r50_bs256_accumulate16_ep300.py (<a href="https://github.com/scnuhealthy">@scnuhealthy</a>)</td><td>default</td><td><a href="https://drive.google.com/file/d/12Zu9r3fE8qKF4OW6WQXa5Ec6VuA2m3j7/view?usp=sharing">byol_r50_bs256_accmulate16_ep300-5df46722.pth</a></td><td>256</td><td>300</td></tr>
-<tr><td></td><td>selfsup/byol/r50_bs2048_accumulate2_ep200_fp16.py (<a href="https://github.com/xieenze">@xieenze</a>)</td><td>default</td><td><a href="https://drive.google.com/file/d/1Poj-rxIebE1ykJhB8EeliSxWiCGP_rbD/view?usp=sharing">byol_r50_bs2048_accmulate2_ep200-e3b0c442.pth</a></td><td>2048</td><td>200</td></tr>
-</tbody></table>
-
-
-## Benchmarks
-
-### VOC07 SVM & SVM Low-shot
-
-<table><thead><tr><th rowspan="2">Method</th><th rowspan="2">Config</th><th rowspan="2">Remarks</th><th rowspan="2">Best layer</th><th rowspan="2">VOC07 SVM</th><th colspan="8">VOC07 SVM Low-shot</th></tr>
-<tr><td>1</td><td>2</td><td>4</td><td>8</td><td>16</td><td>32</td><td>64</td><td>96</td></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py" target="_blank" rel="noopener noreferrer">ImageNet</a></td><td>-</td><td>torchvision</td><td>feat5</td><td>87.17</td><td>52.99</td><td>63.55</td><td>73.7</td><td>78.79</td><td>81.76</td><td>83.75</td><td>85.18</td><td>85.97</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td>feat2</td><td>30.54</td><td>9.15</td><td>9.39</td><td>11.09</td><td>12.3</td><td>14.3</td><td>17.41</td><td>21.32</td><td>23.77</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf" target="_blank" rel="noopener noreferrer">Relative-Loc</a></td><td>selfsup/relative_loc/r50.py</td><td>default</td><td>feat4</td><td>64.78</td><td>18.17</td><td>22.08</td><td>29.37</td><td>35.58</td><td>41.8</td><td>48.73</td><td>55.55</td><td>58.33</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728" target="_blank" rel="noopener noreferrer">Rotation-Pred</a></td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td>feat4</td><td>67.38</td><td>18.91</td><td>23.33</td><td>30.57</td><td>38.22</td><td>45.83</td><td>52.23</td><td>58.08</td><td>61.11</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1807.05520" target="_blank" rel="noopener noreferrer">DeepCluster</a></td><td>selfsup/deepcluster/r50.py</td><td>default</td><td>feat5</td><td>74.26</td><td>29.73</td><td>37.66</td><td>45.85</td><td>55.57</td><td>62.48</td><td>66.15</td><td>70.0</td><td>71.37</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978" target="_blank" rel="noopener noreferrer">NPID</a></td><td>selfsup/npid/r50.py</td><td>default</td><td>feat5</td><td>74.50</td><td>24.19</td><td>31.24</td><td>39.69</td><td>50.99</td><td>59.03</td><td>64.4</td><td>68.69</td><td>70.84</td></tr>
-<tr><td></td><td>selfsup/npid/r50_ensure_neg.py</td><td>ensure_neg=True</td><td>feat5</td><td>75.70</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
-<tr><td><a href="http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhan_Online_Deep_Clustering_for_Unsupervised_Representation_Learning_CVPR_2020_paper.pdf" target="_blank" rel="noopener noreferrer">ODC</a></td><td>selfsup/odc/r50_v1.py</td><td>default</td><td>feat5</td><td>78.42</td><td>32.42</td><td>40.27</td><td>49.95</td><td>59.96</td><td>65.71</td><td>69.99</td><td>73.64</td><td>75.13</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722" target="_blank" rel="noopener noreferrer">MoCo</a></td><td>selfsup/moco/r50_v1.py</td><td>default</td><td>feat5</td><td>79.18</td><td>30.03</td><td>37.73</td><td>47.64</td><td>58.78</td><td>66.0</td><td>70.6</td><td>74.6</td><td>76.07</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297" target="_blank" rel="noopener noreferrer">MoCo v2</a></td><td>selfsup/moco/r50_v2.py</td><td>default</td><td>feat5</td><td>84.26</td><td>43.0</td><td>52.48</td><td>63.43</td><td>71.74</td><td>76.35</td><td>78.9</td><td>81.31</td><td>82.45</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709" target="_blank" rel="noopener noreferrer">SimCLR</a></td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td>feat5</td><td>78.95</td><td>32.45</td><td>40.76</td><td>50.4</td><td>59.01</td><td>65.45</td><td>70.13</td><td>73.58</td><td>75.35</td></tr>
-<tr><td></td><td>selfsup/simclr/r50_bs256_ep200_mocov2_neck.py</td><td>-&gt; MoCo v2 neck</td><td>feat5</td><td>77.65</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
-<tr><td><a href="https://arxiv.org/abs/2006.07733" target="_blank" rel="noopener noreferrer">BYOL</a></td><td>selfsup/byol/r50_bs256_accumulate16_ep300.py</td><td>default</td><td>feat5</td><td>86.58</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
-<tr><td></td><td>selfsup/byol/r50_bs2048_accumulate2_ep200_fp16.py</td><td>default</td><td>feat5</td><td>85.86</td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
-
-</tbody></table>
-
-
-### ImageNet Linear Classification
-
-**Note**
-* Config: `configs/benchmarks/linear_classification/imagenet/r50_multihead.py` for ImageNet (Multi) and `configs/benchmarks/linear_classification/imagenet/r50_last.py` for ImageNet (Last).
-* For DeepCluster, use the corresponding one with `_sobel`.
-* ImageNet (Multi) evaluates features in around 9k dimensions from different layers. Top-1 result of the last epoch is reported.
-* ImageNet (Last) evaluates the last feature after global average pooling, e.g., 2048 dimensions for resnet50. The best top-1 result among all epochs is reported.
-* Usually, we report the best result from ImageNet (Multi) and ImageNet (Last) to ensure fairness, since different methods achieve their best performance on different layers.
-
-<table><thead><tr><th rowspan="2">Method</th><th rowspan="2">Config</th><th rowspan="2">Remarks</th><th colspan="5">ImageNet (Multi)</th><th>ImageNet (Last)</th></tr>
-<tr><td>feat1</td><td>feat2</td><td>feat3</td><td>feat4</td><td>feat5</td><td>avgpool</td></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py" target="_blank" rel="noopener noreferrer">ImageNet</a></td><td>-</td><td>torchvision</td><td>15.18</td><td>33.96</td><td>47.86</td><td>67.56</td><td>76.17</td><td>74.12</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td>11.37</td><td>16.21</td><td>13.47</td><td>9.07</td><td>6.54</td><td>4.35</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf" target="_blank" rel="noopener noreferrer">Relative-Loc</a></td><td>selfsup/relative_loc/r50.py</td><td>default</td><td>14.76</td><td>31.29</td><td>45.77</td><td>49.31</td><td>40.20</td><td>38.83</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728" target="_blank" rel="noopener noreferrer">Rotation-Pred</a></td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td>12.89</td><td>34.30</td><td>44.91</td><td>54.99</td><td>49.09</td><td>47.01</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1807.05520" target="_blank" rel="noopener noreferrer">DeepCluster</a></td><td>selfsup/deepcluster/r50.py</td><td>default</td><td>12.78</td><td>30.81</td><td>43.88</td><td>57.71</td><td>51.68</td><td>46.92</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978" target="_blank" rel="noopener noreferrer">NPID</a></td><td>selfsup/npid/r50.py</td><td>default</td><td>14.28</td><td>31.20</td><td>40.68</td><td>54.46</td><td>56.61</td><td>56.60</td></tr>
-<tr><td><a href="http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhan_Online_Deep_Clustering_for_Unsupervised_Representation_Learning_CVPR_2020_paper.pdf" target="_blank" rel="noopener noreferrer">ODC</a></td><td>selfsup/odc/r50_v1.py</td><td>default</td><td>14.76</td><td>31.82</td><td>42.44</td><td>55.76</td><td>57.70</td><td>53.42</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722" target="_blank" rel="noopener noreferrer">MoCo</a></td><td>selfsup/moco/r50_v1.py</td><td>default</td><td>15.32</td><td>33.08</td><td>44.68</td><td>57.27</td><td>60.60</td><td>61.02</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297" target="_blank" rel="noopener noreferrer">MoCo v2</a></td><td>selfsup/moco/r50_v2.py</td><td>default</td><td>14.74</td><td>32.81</td><td>44.95</td><td>61.61</td><td>66.73</td><td>67.69</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709" target="_blank" rel="noopener noreferrer">SimCLR</a></td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td>17.09</td><td>31.37</td><td>41.38</td><td>54.35</td><td>61.57</td><td>60.06</td></tr>
-<tr><td></td><td>selfsup/simclr/r50_bs256_ep200_mocov2_neck.py</td><td>-&gt; MoCo v2 neck</td><td>16.97</td><td>31.88</td><td>41.73</td><td>54.33</td><td>59.94</td><td>58.00</td></tr>
-<!--
-<tr><td><a href="https://arxiv.org/abs/2006.07733" target="_blank" rel="noopener noreferrer">BYOL</a></td><td>selfsup/byol/r50_bs4096_ep200.py</td><td>default</td><td>16.70</td><td>34.22</td><td>46.61</td><td>60.78</td><td>69.14</td><td>67.10</td></tr>
--->
-<tr><td><a href="https://arxiv.org/abs/2006.07733" target="_blank" rel="noopener noreferrer">BYOL</a></td><td>selfsup/byol/r50_bs256_accumulate16_ep300.py</td><td>default</td><td>14.07</td><td>34.44</td><td>47.22</td><td>63.08</td><td>72.35</td><td></td></tr>
-<tr><td></td><td>selfsup/byol/r50_bs2048_accumulate2_ep200_fp16.py</td><td>default</td><td>15.52</td><td>34.50</td><td>47.22</td><td>62.78</td><td>71.61</td><td></td></tr>
-</tbody></table>
-
-### Places205 Linear Classification
-
-**Note**
-* Config: `configs/benchmarks/linear_classification/places205/r50_multihead.py`.
-* For DeepCluster, use the corresponding one with `_sobel`.
-* Places205 evaluates features in around 9k dimensions from different layers. Top-1 result of the last epoch is reported.
-
-<table><thead><tr><th rowspan="2">Method</th><th rowspan="2">Config</th><th rowspan="2">Remarks</th><th colspan="5">Places205</th></tr>
-<tr><td>feat1</td><td>feat2</td><td>feat3</td><td>feat4</td><td>feat5</td></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py" target="_blank" rel="noopener noreferrer">ImageNet</a></td><td>-</td><td>torchvision</td><td>21.27</td><td>36.10</td><td>43.03</td><td>51.38</td><td>53.05</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td>17.19</td><td>21.70</td><td>19.23</td><td>14.59</td><td>11.73</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf" target="_blank" rel="noopener noreferrer">Relative-Loc</a></td><td>selfsup/relative_loc/r50.py</td><td>default</td><td>21.07</td><td>34.86</td><td>42.84</td><td>45.71</td><td>41.45</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728" target="_blank" rel="noopener noreferrer">Rotation-Pred</a></td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td>18.65</td><td>35.71</td><td>42.28</td><td>45.98</td><td>43.72</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1807.05520" target="_blank" rel="noopener noreferrer">DeepCluster</a></td><td>selfsup/deepcluster/r50.py</td><td>default</td><td>18.80</td><td>33.93</td><td>41.44</td><td>47.22</td><td>42.61</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978" target="_blank" rel="noopener noreferrer">NPID</a></td><td>selfsup/npid/r50.py</td><td>default</td><td>20.53</td><td>34.03</td><td>40.48</td><td>47.13</td><td>47.73</td></tr>
-<tr><td><a href="https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhan_Online_Deep_Clustering_for_Unsupervised_Representation_Learning_CVPR_2020_paper.pdf" target="_blank" rel="noopener noreferrer">ODC</a></td><td>selfsup/odc/r50_v1.py</td><td>default</td><td>20.94</td><td>34.78</td><td>41.19</td><td>47.45</td><td>49.18</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722" target="_blank" rel="noopener noreferrer">MoCo</a></td><td>selfsup/moco/r50_v1.py</td><td>default</td><td>21.13</td><td>35.19</td><td>42.40</td><td>48.78</td><td>50.70</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297" target="_blank" rel="noopener noreferrer">MoCo v2</a></td><td>selfsup/moco/r50_v2.py</td><td>default</td><td>21.88</td><td>35.75</td><td>43.65</td><td>49.99</td><td>52.57</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709" target="_blank" rel="noopener noreferrer">SimCLR</a></td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td>22.55</td><td>34.14</td><td>40.35</td><td>47.15</td><td>51.64</td></tr>
-<tr><td></td><td>selfsup/simclr/r50_bs256_ep200_mocov2_neck.py</td><td>-&gt; MoCo v2 neck</td><td></td><td></td><td></td><td></td><td></td></tr>
-</tbody></table>
-
-### ImageNet Semi-Supervised Classification
-
-**Note**
-* In this benchmark, the necks or heads are removed and only the backbone CNN is evaluated by appending a linear classification head. All parameters are fine-tuned.
-* Config: under `configs/benchmarks/semi_classification/imagenet_1percent/` for 1% data, and `configs/benchmarks/semi_classification/imagenet_10percent/` for 10% data.
-* When training with 1% ImageNet, we find hyper-parameters especially the learning rate greatly influence the performance. Hence, we prepare a list of settings with the base learning rate from \{0.001, 0.01, 0.1\} and the learning rate multiplier for the head from \{1, 10, 100\}. We choose the best performing setting for each method.
-* Please use `--deterministic` in this benchmark.
-
-<table><thead><tr><th rowspan="2">Method</th><th rowspan="2">Config</th><th rowspan="2">Remarks</th><th rowspan="2">Optimal setting for ImageNet 1%</th><th colspan="2">ImageNet 1%</th></tr>
-<tr><td>top-1</td><td>top-5</td></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py" target="_blank" rel="noopener noreferrer">ImageNet</a></td><td>-</td><td>torchvision</td><td>r50_lr0_001_head100.py</td><td>68.68</td><td>88.87</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td>r50_lr0_01_head1.py</td><td>1.56</td><td>4.99</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf" target="_blank" rel="noopener noreferrer">Relative-Loc</a></td><td>selfsup/relative_loc/r50.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>16.48</td><td>40.37</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728" target="_blank" rel="noopener noreferrer">Rotation-Pred</a></td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>18.98</td><td>44.05</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1807.05520" target="_blank" rel="noopener noreferrer">DeepCluster</a></td><td>selfsup/deepcluster/r50.py</td><td>default</td><td>r50_lr0_01_head1_sobel.py</td><td>33.44</td><td>58.62</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978" target="_blank" rel="noopener noreferrer">NPID</a></td><td>selfsup/npid/r50.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>27.95</td><td>54.37</td></tr>
-<tr><td><a href="http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhan_Online_Deep_Clustering_for_Unsupervised_Representation_Learning_CVPR_2020_paper.pdf" target="_blank" rel="noopener noreferrer">ODC</a></td><td>selfsup/odc/r50_v1.py</td><td>default</td><td>r50_lr0_1_head100.py</td><td>32.39</td><td>61.02</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722" target="_blank" rel="noopener noreferrer">MoCo</a></td><td>selfsup/moco/r50_v1.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>33.15</td><td>61.30</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297" target="_blank" rel="noopener noreferrer">MoCo v2</a></td><td>selfsup/moco/r50_v2.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>39.07</td><td>68.31</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709" target="_blank" rel="noopener noreferrer">SimCLR</a></td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>36.09</td><td>64.50</td></tr>
-<tr><td></td><td>selfsup/simclr/r50_bs256_ep200_mocov2_neck.py</td><td>-&gt; MoCo v2 neck</td><td>r50_lr0_01_head100.py</td><td>36.31</td><td>64.68</td></tr>
-</tbody></table>
-
-<table><thead><tr><th rowspan="2">Method</th><th rowspan="2">Config</th><th rowspan="2">Remarks</th><th rowspan="2">Optimal setting for ImageNet 10%</th><th colspan="2">ImageNet 10%</th></tr>
-<tr><td>top-1</td><td>top-5</td></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py" target="_blank" rel="noopener noreferrer">ImageNet</a></td><td>-</td><td>torchvision</td><td>r50_lr0_001_head10.py</td><td>74.53</td><td>92.19</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td>r50_lr0_01_head1.py</td><td>21.78</td><td>44.24</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf" target="_blank" rel="noopener noreferrer">Relative-Loc</a></td><td>selfsup/relative_loc/r50.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>53.86</td><td>79.62</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728" target="_blank" rel="noopener noreferrer">Rotation-Pred</a></td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>54.75</td><td>80.21</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1807.05520" target="_blank" rel="noopener noreferrer">DeepCluster</a></td><td>selfsup/deepcluster/r50.py</td><td>default</td><td>r50_lr0_01_head1_sobel.py</td><td>52.94</td><td>77.96</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978" target="_blank" rel="noopener noreferrer">NPID</a></td><td>selfsup/npid/r50.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>57.22</td><td>81.39</td></tr>
-<tr><td><a href="http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhan_Online_Deep_Clustering_for_Unsupervised_Representation_Learning_CVPR_2020_paper.pdf" target="_blank" rel="noopener noreferrer">ODC</a></td><td>selfsup/odc/r50_v1.py</td><td>default</td><td>r50_lr0_1_head10.py</td><td>58.15</td><td>82.55</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722" target="_blank" rel="noopener noreferrer">MoCo</a></td><td>selfsup/moco/r50_v1.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>60.08</td><td>84.02</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297" target="_blank" rel="noopener noreferrer">MoCo v2</a></td><td>selfsup/moco/r50_v2.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>61.80</td><td>85.11</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709" target="_blank" rel="noopener noreferrer">SimCLR</a></td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td>r50_lr0_01_head100.py</td><td>58.46</td><td>82.60</td></tr>
-<tr><td></td><td>selfsup/simclr/r50_bs256_ep200_mocov2_neck.py</td><td>-&gt; MoCo v2 neck</td><td>r50_lr0_01_head100.py</td><td>58.38</td><td>82.53</td></tr>
-</tbody></table>
-
-### PASCAL VOC07+12 Object Detection
-
-**Note**
-* This benchmark follows the evluation protocols set up by MoCo.
-* Config: `benchmarks/detection/configs/pascal_voc_R_50_C4_24k_moco.yaml`.
-* Please follow [here](GETTING_STARTED.md#voc0712--coco17-object-detection) to run the evaluation.
-
-<table><thead><tr><th rowspan="2">Method</th><th rowspan="2">Config</th><th rowspan="2">Remarks</th><th colspan="3">VOC07+12</th></tr>
-<tr><td>AP50</td><td>AP</td><td>AP75</td></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py" target="_blank" rel="noopener noreferrer">ImageNet</a></td><td>-</td><td>torchvision</td><td>81.58</td><td>54.19</td><td>59.80</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td>59.02</td><td>32.83</td><td>31.60</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf" target="_blank" rel="noopener noreferrer">Relative-Loc</a></td><td>selfsup/relative_loc/r50.py</td><td>default</td><td>80.36</td><td>55.13</td><td>61.18</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728" target="_blank" rel="noopener noreferrer">Rotation-Pred</a></td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td>80.91</td><td>55.52</td><td>61.39</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978" target="_blank" rel="noopener noreferrer">NPID</a></td><td>selfsup/npid/r50.py</td><td>default</td><td>80.03</td><td>54.11</td><td>59.50</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722" target="_blank" rel="noopener noreferrer">MoCo</a></td><td>selfsup/moco/r50_v1.py</td><td>default</td><td>81.38</td><td>55.95</td><td>62.23</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297" target="_blank" rel="noopener noreferrer">MoCo v2</a></td><td>selfsup/moco/r50_v2.py</td><td>default</td><td>82.24</td><td>56.97</td><td>63.43</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709" target="_blank" rel="noopener noreferrer">SimCLR</a></td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td>79.41</td><td>51.54</td><td>55.63</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2006.07733" target="_blank" rel="noopener noreferrer">BYOL</a></td><td>selfsup/byol/r50_bs2048_accumulate2_ep200_fp16.py</td><td>default</td><td>79.60</td><td>49.00</td><td>52.80</td></tr>
-</tbody></table>
-
-### COCO2017 Object Detection
-
-**Note**
-* This benchmark follows the evluation protocols set up by MoCo.
-* Config: `benchmarks/detection/configs/coco_R_50_C4_2x_moco.yaml`.
-* Please follow [here](GETTING_STARTED.md#voc0712--coco17-object-detection) to run the evaluation.
-
-<table><thead><tr><th rowspan="2">Method</th><th rowspan="2">Config</th><th rowspan="2">Remarks</th><th colspan="6">COCO2017</th></tr>
-<tr><td>AP50(Box)</td><td>AP(Box)</td><td>AP75(Box)</td><td>AP50(Mask)</td><td>AP(Mask)</td><td>AP75(Mask)</td></tr></thead><tbody>
-<tr><td><a href="https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py" target="_blank" rel="noopener noreferrer">ImageNet</a></td><td>-</td><td>torchvision</td><td>59.9</td><td>40.0</td><td>43.1</td><td>56.5</td><td>34.7</td><td>36.9</td></tr>
-<tr><td>Random</td><td>-</td><td>kaiming</td><td>54.6</td><td>35.6</td><td>38.2</td><td>51.5</td><td>31.4</td><td>33.5</td></tr>
-<tr><td><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf" target="_blank" rel="noopener noreferrer">Relative-Loc</a></td><td>selfsup/relative_loc/r50.py</td><td>default</td><td>59.6</td><td>40.0</td><td>43.5</td><td>56.5</td><td>35.0</td><td>37.3</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1803.07728" target="_blank" rel="noopener noreferrer">Rotation-Pred</a></td><td>selfsup/rotation_pred/r50.py</td><td>default</td><td>59.3</td><td>40.0</td><td>43.6</td><td>56.0</td><td>34.9</td><td>37.4</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1805.01978" target="_blank" rel="noopener noreferrer">NPID</a></td><td>selfsup/npid/r50.py</td><td>default</td><td>59.0</td><td>39.4</td><td>42.8</td><td>55.9</td><td>34.5</td><td>36.6</td></tr>
-<tr><td><a href="https://arxiv.org/abs/1911.05722" target="_blank" rel="noopener noreferrer">MoCo</a></td><td>selfsup/moco/r50_v1.py</td><td>default</td><td>60.5</td><td>40.9</td><td>44.2</td><td>57.1</td><td>35.5</td><td>37.7</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2003.04297">MoCo v2</a></td><td>selfsup/moco/r50_v2.py</td><td>default</td><td>60.6</td><td>41.0</td><td>44.5</td><td>57.2</td><td>35.6</td><td>38.0</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2002.05709">SimCLR</a></td><td>selfsup/simclr/r50_bs256_ep200.py</td><td>default</td><td>59.1</td><td>39.6</td><td>42.9</td><td>55.9</td><td>34.6</td><td>37.1</td></tr>
-<tr><td><a href="https://arxiv.org/abs/2006.07733">BYOL</a></td><td>selfsup/byol/r50_bs2048_accumulate2_ep200_fp16.py</td><td>default</td><td>60.6</td><td>40.2</td><td>43.3</td><td>57.0</td><td>34.9</td><td>36.7</td></tr>
-</tbody></table>
-
diff --git a/docs/relation.jpg b/docs/relation.jpg
deleted file mode 100644
index 00e039b2..00000000
Binary files a/docs/relation.jpg and /dev/null differ
diff --git a/docs_zh-CN/GETTING_STARTED.md b/docs_zh-CN/GETTING_STARTED.md
deleted file mode 100644
index eb7533cd..00000000
--- a/docs_zh-CN/GETTING_STARTED.md
+++ /dev/null
@@ -1,287 +0,0 @@
-# Getting Started
-
-This page provides basic tutorials about the usage of OpenSelfSup.
-For installation instructions, please see [INSTALL.md](INSTALL.md).
-
-## Train existing methods
-
-**Note**: The default learning rate in config files is for 8 GPUs. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
-
-### Train with single/multiple GPUs
-
-```shell
-bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]
-```
-Optional arguments are:
-- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
-- `--pretrained ${PRETRAIN_WEIGHTS}`: Load pretrained weights for the backbone.
-- `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
-
-An example:
-```shell
-# checkpoints and logs saved in WORK_DIR=work_dirs/selfsup/odc/r50_v1/
-bash tools/dist_train.sh configs/selfsup/odc/r50_v1.py 8
-```
-**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
-
-```shell
-ln -s /DATA/xhzhan/openselfsup_workdirs ${OPENSELFSUP}/work_dirs
-```
-
-Alternatively, if you run OpenSelfSup on a cluster managed with [slurm](https://slurm.schedmd.com/):
-```shell
-SRUN_ARGS="${SRUN_ARGS}" bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} ${GPUS} [optional arguments]
-```
-
-An example:
-```shell
-SRUN_ARGS="-w xx.xx.xx.xx" bash tools/srun_train.sh Dummy configs/selfsup/odc/r50_v1.py 8 --resume_from work_dirs/selfsup/odc/r50_v1/epoch_100.pth
-```
-
-### Train with multiple machines
-
-If you launch with multiple machines simply connected with ethernet, you have to modify `tools/dist_train.sh` or create a new script, please refer to PyTorch [Launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility). Usually it is slow if you do not have high speed networking like InfiniBand.
-
-If you launch with slurm, the command is the same as that on single machine described above. You only need to change ${GPUS}, e.g., to 16 for two 8-GPU machines.
-
-### Launch multiple jobs on a single machine
-
-If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
-you need to specify different ports (29500 by default) for each job to avoid communication conflict.
-
-If you use `dist_train.sh` to launch training jobs:
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh ${CONFIG_FILE} 4
-CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash tools/dist_train.sh ${CONFIG_FILE} 4
-```
-
-If you use launch training jobs with slurm:
-```shell
-GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29500
-GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29501
-```
-
-### What if I do not have so many GPUs?
-
-Assuming that you only have 1 GPU that can contain 64 images in a batch, while you expect the batch size to be 256, you may add the following line into your config file. It performs network update every 4 iterations. In this way, the equivalent batch size is 256. Of course, it is about 4x slower than using 4 GPUs. Note that the workaround is not applicable for methods like SimCLR which require intra-batch communication.
-
-```python
-optimizer_config = dict(update_interval=4)
-```
-
-### Mixed Precision Training (Optional)
-We use [Apex](https://github.com/NVIDIA/apex) to implement Mixed Precision Training.
-If you want to use Mixed Precision Training, you can add below in the config file.
-```python
-use_fp16 = True
-optimizer_config = dict(use_fp16=use_fp16)
-```
-An example:
-```python
-bash tools/dist_train.sh configs/selfsup/moco/r50_v1_fp16.py 8
-```
-
-### Speeding Up IO (Optional)
-1 . Prefetching data helps to speeding up IO and make better use of CUDA stream parallelization.
-If you want to use it, you can activate it in the config file (disabled by default).
-```python
-prefetch = True
-```
-2 . Costly operation ToTensor is reimplemented along with prefetch.
-
-3 . Replacing  Pillow with Pillow-SIMD (https://github.com/uploadcare/pillow-simd.git) to make use of SIMD command sets with modern CPU.
- ```shell
-pip uninstall pillow
-pip install Pillow-SIMD or CC="cc -mavx2" pip install -U --force-reinstall pillow-simd if AVX2 is available.
-```
-We test it using MoCoV2 using a total batch size of 256 on Tesla V100. The training time per step is decreased to 0.17s from 0.23s.
-## Benchmarks
-
-We provide several standard benchmarks to evaluate representation learning. The config files or scripts for evaluation mentioned below are NOT recommended to be changed if you want to use this repo in your publications. We hope that all methods are under a fair comparison.
-
-### VOC07 Linear SVM & Low-shot Linear SVM
-
-```shell
-# test by epoch (only applicable to experiments trained with OpenSelfSup)
-bash benchmarks/dist_test_svm_epoch.sh ${CONFIG_FILE} ${EPOCH} ${FEAT_LIST} ${GPUS}
-# test a pretrained model (applicable to any pre-trained models)
-bash benchmarks/dist_test_svm_pretrain.sh ${CONFIG_FILE} ${PRETRAIN} ${FEAT_LIST} ${GPUS}
-```
-Augments:
-- `${CONFIG_FILE}` the config file of the self-supervised experiment.
-- `${FEAT_LIST}` is a string to specify features from layer1 to layer5 to evaluate; e.g., if you want to evaluate layer5 only, then `FEAT_LIST` is `"feat5"`, if you want to evaluate all features, then then `FEAT_LIST` is `"feat1 feat2 feat3 feat4 feat5"` (separated by space). If left empty, the default `FEAT_LIST` is `"feat5"`.
-- `$GPUS` is the number of GPUs to extract features.
-
-Working directories:
-The features, logs and intermediate files generated are saved in `$SVM_WORK_DIR/` as follows:
-- `dist_test_svm_epoch.sh`: `SVM_WORK_DIR=$WORK_DIR/` (The same as that mentioned in `Train with single/multiple GPUs` above.) Hence, the files will be overridden to save space when evaluating with a new `$EPOCH`.
-- `dist_test_svm_pretrain.sh`: `SVM_WORK_DIR=$WORK_DIR/$PRETRAIN_NAME/`, e.g., if `PRETRAIN=pretrains/odc_r50_v1-5af5dd0c.pth`, then `PRETRAIN_NAME=odc_r50_v1-5af5dd0c.pth`; if `PRETRAIN=random`, then `PRETRAIN_NAME=random`.
-
-Notes:
-- The evaluation records are saved in `$SVM_WORK_DIR/logs/eval_svm.log`.
-- When using `benchmarks/dist_test_svm_epoch.sh`, DO NOT launch multiple tests of the same experiment with different epochs, since they share the same working directory.
-- Linear SVM takes 5 min, low-shot linear SVM takes about 1 hour with 32 CPU cores. If you want to save time, you may delete or comment the low-shot SVM testing command (the last line in the scripts).
-
-### ImageNet / Places205 Linear Classification
-
-**First**, extract backbone weights:
-```shell
-python tools/extract_backbone_weights.py ${CHECKPOINT} ${WEIGHT_FILE}
-```
-Arguments:
-- `CHECKPOINTS`: the checkpoint file of a selfsup method named as `epoch_*.pth`.
-- `WEIGHT_FILE`: the output backbone weights file, e.g., `pretrains/moco_r50_v1-4ad89b5c.pth`.
-
-**Next**, train and test linear classification:
-```shell
-# train
-bash benchmarks/dist_train_linear.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
-# test (unnecessary if have validation in training)
-bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${CHECKPOINT}
-```
-Augments:
-- `CONFIG_FILE`: Use config files under "configs/benchmarks/linear_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/benchmarks/linear_classification/imagenet/r50_multihead_sobel.py`.
-- Optional arguments include:
-    - `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
-    - `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
-
-Working directories:
-Where are the checkpoints and logs? E.g., if you use `configs/benchmarks/linear_classification/imagenet/r50_multihead.py` to evaluate `pretrains/moco_r50_v1-4ad89b5c.pth`, then the working directories for this evaluation is `work_dirs/benchmarks/linear_classification/imagenet/r50_multihead/moco_r50_v1-4ad89b5c.pth/`.
-
-### ImageNet Semi-Supervised Classification
-
-```shell
-# train
-bash benchmarks/dist_train_semi.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
-# test (unnecessary if have validation in training)
-bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${CHECKPOINT}
-```
-Augments:
-- `CONFIG_FILE`: Use config files under "configs/benchmarks/semi_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/benchmarks/semi_classification/imagenet_1percent/r50_sobel.py`.
-- Optional arguments include:
-    - `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
-    - `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
-
-### VOC07+12 / COCO17 Object Detection
-
-For more details to setup the environments for detection, please refer [here](https://github.com/open-mmlab/OpenSelfSup/blob/master/benchmarks/detection/README.md).
-
-```shell
-conda activate detectron2 # use detectron2 environment here, otherwise use open-mmlab environment
-cd benchmarks/detection
-python convert-pretrain-to-detectron2.py ${WEIGHT_FILE} ${OUTPUT_FILE} # must use .pkl as the output extension.
-bash run.sh ${DET_CFG} ${OUTPUT_FILE}
-```
-Arguments:
-- `WEIGHT_FILE`: The extracted backbone weights extracted aforementioned.
-- `OUTPUT_FILE`: Converted backbone weights file, e.g., `odc_v1.pkl`.
-- `DET_CFG`: The detectron2 config file, usually we use `configs/pascal_voc_R_50_C4_24k_moco.yaml`.
-
-**Note**:
-- This benchmark must use 8 GPUs as the default setting from MoCo.
-- Please report the mean of 5 trials in your official paper, according to MoCo.
-- DeepCluster that uses Sobel layer is not supported by detectron2.
-
-## Tools and Tips
-
-### Count number of parameters
-
-```shell
-python tools/count_parameters.py ${CONFIG_FILE}
-```
-
-### Publish a model
-
-Compute the hash of the weight file and append the hash id to the filename. The output file is the input file name with a hash suffix.
-
-```shell
-python tools/publish_model.py ${WEIGHT_FILE}
-```
-Arguments:
-- `WEIGHT_FILE`: The extracted backbone weights extracted aforementioned.
-
-### Reproducibility
-
-If you want to make your performance exactly reproducible, please switch on `--deterministic` to train the final model to be published. Note that this flag will switch off `torch.backends.cudnn.benchmark` and slow down the training speed.
-
-## How-to
-
-### Use a new dataset
-
-1. Write a data source file under `openselfsup/datasets/data_sources/`. You may refer to the existing ones.
-
-2. Create new config files for your experiments.
-
-### Design your own methods
-
-#### What you need to do
-
-    1. Create a dataset file under `openselfsup/datasets/` (better using existing ones);
-    2. Create a model file under `openselfsup/models/`. The model typically contains:
-      i) backbone (required): images to deep features from differet depth of layers. Your model must contain a `self.backbone` module, otherwise the backbone weights cannot be extracted.
-      ii) neck (optional): deep features to compact feature vectors.
-      iii) head (optional): define loss functions.
-      iv) memory_bank (optional): define memory banks.
-    3. Create a config file under `configs/` and setup the configs;
-    4. [Optional] Create a hook file under `openselfsup/hooks/` if your method requires additional operations before run, every several iterations, every several epoch, or after run.
-
-You may refer to existing modules under respective folders.
-
-#### Features that may facilitate your implementation
-
-* Decoupled data source and dataset.
-
-Since dataset is correlated to a specific task while data source is general, we decouple data source and dataset in OpenSelfSup.
-
-```python
-data = dict(
-    train=dict(type='ContrastiveDataset',
-               data_source=dict(type='ImageNet', list_file='xx', root='xx'),
-               pipeline=train_pipeline),
-    val=dict(...),
-    ...
-)
-```
-
-* Configure data augmentations in the config file.
-
-The augmentations are the same as `torchvision.transforms` except that `torchvision.transforms.RandomAppy` corresponds to `RandomAppliedTrans`. `Lighting` and `GaussianBlur` is additionally implemented.
-
-```python
-img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
-train_pipeline = [
-    dict(type='RandomResizedCrop', size=224),
-    dict(type='RandomAppliedTrans',
-        transforms=[
-            dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, kernel_size=23)],
-        p=0.5),
-    dict(type='ToTensor'),
-    dict(type='Normalize', **img_norm_cfg)
-]
-```
-
-* Parameter-wise optimization parameters.
-
-You may specify optimization parameters including lr, momentum and weight_decay for a certain group of parameters in the config file with `paramwise_options`. `paramwise_options` is a dict whose key is regular expressions and value is options. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult, lars_exclude (only works with LARS optimizer).
-
-```python
-# this config sets all normalization layers with weight_decay_mult=0.1,
-# and the head with `lr_mult=10, momentum=0`.
-paramwise_options = {
-    '(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1),
-    '\Ahead.': dict(lr_mult=10, momentum=0)}
-optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
-                     weight_decay=0.0001,
-                     paramwise_options=paramwise_options)
-```
-
-* Configure custom hooks in the config file.
-
-The hooks will be called in order. For hook design, please refer to [odc_hook.py](https://github.com/open-mmlab/OpenSelfSup/blob/master/openselfsup/hooks/odc_hook.py) as an example.
-
-```python
-custom_hooks = [
-    dict(type='DeepClusterHook', ...),
-    dict(type='ODCHook', ...),
-]
-```
diff --git a/docs_zh-CN/getting_started.md b/docs_zh-CN/getting_started.md
new file mode 100644
index 00000000..328c4f1b
--- /dev/null
+++ b/docs_zh-CN/getting_started.md
@@ -0,0 +1,149 @@
+# Getting Started
+
+- [Getting Started](#getting-started)
+  - [Train existing methods](#train-existing-methods)
+    - [Train with single/multiple GPUs](#train-with-singlemultiple-gpus)
+    - [Train with multiple machines](#train-with-multiple-machines)
+    - [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
+  - [Benchmarks](#benchmarks)
+  - [Tools and Tips](#tools-and-tips)
+    - [Count number of parameters](#count-number-of-parameters)
+    - [Publish a model](#publish-a-model)
+    - [Use t-SNE](#use-t-sne)
+    - [Reproducibility](#reproducibility)
+
+This page provides basic tutorials about the usage of MMSelfSup. For installation instructions, please see [install.md](install.md).
+
+## Train existing methods
+
+**Note**: The default learning rate in config files is for 8 GPUs. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
+
+### Train with single/multiple GPUs
+
+```shell
+bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} --work_dir ${YOUR_WORK_DIR} [optional arguments]
+```
+
+Optional arguments are:
+
+- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
+- `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
+
+An example:
+
+```shell
+# checkpoints and logs saved in WORK_DIR=work_dirs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k/
+bash tools/dist_train.sh configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py 8 --work_dir work_dirs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k/
+```
+
+**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
+
+```shell
+ln -s ${YOUR_WORK_DIRS} ${MMSELFSUP}/work_dirs
+```
+
+Alternatively, if you run MMSelfSup on a cluster managed with [slurm](https://slurm.schedmd.com/):
+
+```shell
+GPUS_PER_NODE=${GPUS_PER_NODE} GPUS=${GPUS} SRUN_ARGS=${SRUN_ARGS} bash tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${YOUR_WORK_DIR} [optional arguments]
+```
+
+An example:
+
+```shell
+GPUS_PER_NODE=8 GPUS=8 bash tools/srun_train.sh Dummy Test_job configs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k.py work_dirs/selfsup/odc/odc_resnet50_8xb64-steplr-440e_in1k/
+```
+
+### Train with multiple machines
+
+If you launch with multiple machines simply connected with ethernet, you have to modify `tools/dist_train.sh` or create a new script, please refer to PyTorch [Launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility). Usually it is slow if you do not have high speed networking like InfiniBand.
+
+If you launch with slurm, the command is the same as that on single machine described above, but you need refer to [slurm_train.sh](../tools/slurm_train.sh) to set appropriate parameters and environment variables.
+
+### Launch multiple jobs on a single machine
+
+If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict.
+
+If you use `dist_train.sh` to launch training jobs:
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_1
+CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash tools/dist_train.sh ${CONFIG_FILE} 4 --work_dir tmp_work_dir_2
+```
+
+If you use launch training jobs with slurm, you have two options to set different communication ports:
+
+Option 1:
+
+In `config1.py`:
+
+```python
+dist_params = dict(backend='nccl', port=29500)
+```
+
+In `config2.py`:
+
+```python
+dist_params = dict(backend='nccl', port=29501)
+```
+
+Then you can launch two jobs with config1.py and config2.py.
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 bash tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 bash tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2
+```
+
+Option 2:
+
+You can set different communication ports without the need to modify the configuration file, but have to set the `cfg-options` to overwrite the default port in configuration file.
+
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 GPUS=4 bash tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config1.py tmp_work_dir_1 --cfg-options dist_params.port=29500
+CUDA_VISIBLE_DEVICES=4,5,6,7 GPUS=4 bash tools/slurm_train.sh ${PARTITION} ${JOB_NAME} config2.py tmp_work_dir_2 --cfg-options dist_params.port=29501
+```
+
+## Benchmarks
+
+We also provide commands to evaluate your pre-trained model on several downstream task, and you can refer to [Benchmarks](./tutorials/6_benchmarks.md) for the details.
+
+## Tools and Tips
+
+### Count number of parameters
+
+```shell
+python tools/analysis_tools/count_parameters.py ${CONFIG_FILE}
+```
+
+### Publish a model
+
+Before you publish a model, you may want to
+
+- Convert model weights to CPU tensors.
+- Delete the optimizer states.
+- Compute the hash of the checkpoint file and append the hash id to the filename.
+
+```shell
+python tools/model_converters/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME}
+```
+
+
+### Use t-SNE
+
+We provide an off-the-shelf tool to visualize the quality of image representations by t-SNE.
+
+```shell
+python tools/analysis_tools/visualize_tsne.py ${CONFIG_FILE} --checkpoint ${CKPT_PATH} --work_dir ${WORK_DIR} [optional arguments]
+```
+
+Arguments:
+
+- `CONFIG_FILE`: config file for the pre-trained model.
+- `CKPT_PATH`: the path of model's checkpoint.
+- `WORK_DIR`: the directory to save the results of visualization.
+- `[optional arguments]`: for optional arguments, you can refer to [visualize_tsne.py](../tools/analysis_tools/visualize_tsne.py)
+
+
+### Reproducibility
+
+If you want to make your performance exactly reproducible, please switch on `--deterministic` to train the final model to be published. Note that this flag will switch off `torch.backends.cudnn.benchmark` and slow down the training speed.
diff --git a/docs_zh-CN/prepare_data.md b/docs_zh-CN/prepare_data.md
new file mode 100644
index 00000000..abe63dea
--- /dev/null
+++ b/docs_zh-CN/prepare_data.md
@@ -0,0 +1,87 @@
+# Prepare Datasets
+
+MMSelfsup supports multiple datasets. Please follow the corresponding guidelines for data preparation. It is recommended to symlink your dataset root to `$MMSELFSUP/data`. If your folder structure is different, you may need to change the corresponding paths in config files.
+
+- [Prepare ImageNet](#prepare-imagenet)
+- [Prepare Place205](#prepare-place205)
+- [Prepare iNaturalist2018](#prepare-inaturalist2018)
+- [Prepare PASCAL VOC](#prepare-pascal-voc)
+- [Prepare CIFAR10](#prepare-cifar10)
+- [Prepare datasets for detection and segmentation](#prepare-datasets-for-detection-and-segmentation)
+  - [Detection](#detection)
+  - [Segmentation](#segmentation)
+
+```
+mmselfsup
+├── mmselfsup
+├── tools
+├── configs
+├── docs
+├── data
+│   ├── imagenet
+│   │   ├── meta
+│   │   ├── train
+│   │   ├── val
+│   ├── places205
+│   │   ├── meta
+│   │   ├── train
+│   │   ├── val
+│   ├── inaturalist2018
+│   │   ├── meta
+│   │   ├── train
+│   │   ├── val
+│   ├── VOCdevkit
+│   │   ├── VOC2007
+│   ├── cifar
+│   │   ├── cifar-10-batches-py
+
+```
+
+## Prepare ImageNet
+
+For ImageNet, it has multiple versions, but the most commonly used one is [ILSVRC 2012](http://www.image-net.org/challenges/LSVRC/2012/). It can be accessed with the following steps:
+
+1. Register an account and login to the [download page](http://www.image-net.org/download-images)
+2. Find download links for ILSVRC2012 and download the following two files
+    - ILSVRC2012_img_train.tar (~138GB)
+    - ILSVRC2012_img_val.tar (~6.3GB)
+3. Untar the downloaded files
+4. Download meta data using this [script](https://github.com/BVLC/caffe/blob/master/data/ilsvrc12/get_ilsvrc_aux.sh)
+
+## Prepare Place205
+
+For Places205, you need to:
+
+1. Register an account and login to the [download page](http://places.csail.mit.edu/downloadData.html)
+2. Download the resized images and the image list of train set and validation set of Places205
+3. Untar the downloaded files
+
+## Prepare iNaturalist2018
+
+For iNaturalist2018, you need to:
+
+1. Download the training and validation images and annotations from the [download page](https://github.com/visipedia/inat_comp/tree/master/2018)
+2. Untar the downloaded files
+3. Convert the original json annotation format to the list format using the script `tools/data_converters/convert_inaturalist.py`
+
+## Prepare PASCAL VOC
+
+Assuming that you usually store datasets in `$YOUR_DATA_ROOT`. The following command will automatically download PASCAL VOC 2007 into `$YOUR_DATA_ROOT`, prepare the required files, create a folder `data` under `$MMSELFSUP` and make a symlink `VOCdevkit`.
+
+```shell
+bash tools/data_converters/prepare_voc07_cls.sh $YOUR_DATA_ROOT
+```
+
+## Prepare CIFAR10
+
+CIFAR10 will be downloaded automatically if it is not found. In addition, `dataset` implemented by `MMSelfSup` will also automatically structure CIFAR10 to the appropriate format.
+
+## Prepare datasets for detection and segmentation
+
+### Detection
+
+To prepare COCO, VOC2007 and VOC2012 for detection, you can refer to [mmdet](https://github.com/open-mmlab/mmdetection/blob/master/docs/1_exist_data_model.md).
+
+### Segmentation
+
+To prepare VOC2012AUG and Cityscapes for segmentation, you can refer to [mmseg](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/dataset_prepare.md#prepare-datasets)