mmselfsup/docs/GETTING_STARTED.md

220 lines
8.9 KiB
Markdown
Raw Normal View History

2020-06-16 00:05:18 +08:00
# Getting Started
This page provides basic tutorials about the usage of OpenSelfSup.
For installation instructions, please see [INSTALL.md](INSTALL.md).
## Train existing methods
**Note**: The default learning rate in config files is for 8 GPUs (except for those under `configs/linear_classification` that use 1 GPU). If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
### Train with single/multiple GPUs
2020-06-16 23:19:32 +08:00
2020-06-16 00:05:18 +08:00
```shell
2020-06-16 23:19:32 +08:00
#
2020-06-17 02:06:16 +08:00
bash tools/dist_train.sh ${CONFIG_FILE} ${GPUS} [optional arguments]
2020-06-16 00:05:18 +08:00
```
2020-06-16 23:19:32 +08:00
Optional arguments are:
- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
- `--pretrained ${PRETRAIN_WEIGHTS}`: Load pretrained weights for the backbone.
2020-06-16 00:05:18 +08:00
An example:
```shell
bash tools/dist_train.sh configs/selfsup/odc/r50_v1.py 8
```
2020-06-17 02:06:16 +08:00
**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
2020-06-16 00:05:18 +08:00
2020-06-16 23:19:32 +08:00
```shell
ln -s /DATA/xhzhan/openselfsup_workdirs ${OPENSELFSUP}/work_dirs
```
2020-06-16 00:05:18 +08:00
Alternatively, if you run OpenSelfSup on a cluster managed with [slurm](https://slurm.schedmd.com/):
```shell
2020-06-17 02:06:16 +08:00
SRUN_ARGS="${SRUN_ARGS}" bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} ${GPUS} [optional arguments]
2020-06-16 00:05:18 +08:00
```
An example:
```shell
2020-06-17 02:06:16 +08:00
SRUN_ARGS="-w xx.xx.xx.xx" bash tools/srun_train.sh Dummy configs/selfsup/odc/r50_v1.py 8 --resume_from work_dirs/selfsup/odc/r50_v1/epoch_100.pth
2020-06-16 00:05:18 +08:00
```
### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs,
you need to specify different ports (29500 by default) for each job to avoid communication conflict.
If you use `dist_train.sh` to launch training jobs:
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 bash tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 bash tools/dist_train.sh ${CONFIG_FILE} 4
```
If you use launch training jobs with slurm:
```shell
GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29500
GPUS_PER_NODE=4 bash tools/srun_train.sh ${PARTITION} ${CONFIG_FILE} 4 --port 29501
```
## Benchmarks
We provide several standard benchmarks to evaluate representation learning.
### VOC07 Linear SVM & Low-shot Linear SVM
```shell
2020-06-17 02:06:16 +08:00
# test by epoch
bash benchmarks/dist_test_svm.sh ${CONFIG_FILE} ${EPOCH} ${FEAT_LIST} ${GPUS}
# test pretrained model
bash benchmarks/dist_test_svm.sh ${CONFIG_FILE} ${PRETRAIN} ${FEAT_LIST} ${GPUS}
# test random init
bash benchmarks/dist_test_svm.sh ${CONFIG_FILE} "random" ${FEAT_LIST} ${GPUS}
2020-06-16 00:05:18 +08:00
```
Augments:
2020-06-17 02:06:16 +08:00
- `${FEAT_LIST}` is a string to specify features from layer1 to layer5 to evaluate; e.g., if you want to evaluate layer5 only, then `FEAT_LIST` is `feat5`, if you want to evaluate all features, then then `FEAT_LIST` is `feat1 feat2 feat3 feat4 feat5` (separated by space). If left empty, the default `FEAT_LIST` is `feat5`.
- `$GPUS` is the number of GPUs to extract features.
2020-06-16 00:05:18 +08:00
### ImageNet / Places205 Linear Classification
2020-06-17 02:06:16 +08:00
**First**, extract backbone weights:
2020-06-16 00:05:18 +08:00
```shell
2020-06-17 02:06:16 +08:00
python tools/extract_backbone_weights.py ${CHECKPOINT} ${WEIGHT_FILE}
```
Arguments:
- `CHECKPOINTS`: the checkpoint file of a selfsup method named as `epoch_*.pth`.
- `WEIGHT_FILE`: the output backbone weights file, e.g., `pretrains/moco_v1_epoch200.pth`.
**Next**, train and test linear classification:
```shell
bash benchmarks/dist_test_linear.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
2020-06-16 00:05:18 +08:00
```
Augments:
2020-06-17 02:06:16 +08:00
- `CONFIG_FILE`: Use config files under "configs/linear_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/linear_classification/imagenet/r50_multihead_sobel.py`.
2020-06-16 00:05:18 +08:00
- Optional arguments include `--resume_from ${CHECKPOINT_FILE}` that resume from a previous checkpoint file.
2020-06-16 16:30:46 +08:00
### ImageNet Semi-Supervised Classification
```shell
2020-06-17 02:06:16 +08:00
bash benchmarks/dist_test_semi.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
2020-06-17 02:15:19 +08:00
```
2020-06-16 16:30:46 +08:00
Arguments:
2020-06-17 02:06:16 +08:00
- `CONFIG_FILE`: Use config files under "configs/classification/imagenet_*percent/"
- `WEIGHT_FILE`: The extracted backbone weights extracted aforementioned.
- Optional arguments: The same as aforementioned.
2020-06-16 16:30:46 +08:00
2020-06-16 00:05:18 +08:00
### VOC07+12 / COCO17 Object Detection
2020-06-17 02:06:16 +08:00
For more details to setup the environments for detection, please refer [here](benchmarks/detection/README.md).
2020-06-16 00:05:18 +08:00
2020-06-17 02:06:16 +08:00
```shell
conda activate detectron2
cd benchmarks/detection
python convert-pretrain-to-detectron2.py ${WEIGHT_FILE} ${OUTPUT_FILE} # must use .pkl as the output extension.
bash run.sh ${DET_CFG} ${OUTPUT_FILE}
```
Arguments:
- `WEIGHT_FILE`: The extracted backbone weights extracted aforementioned.
- `OUTPUT_FILE`: Converted backbone weights file, e.g., `odc_v1.pkl`.
- `DET_CFG`: The detectron2 config file, usually we use `configs/pascal_voc_R_50_C4_24k_moco.yaml`.
2020-06-16 00:05:18 +08:00
**Note**:
- This benchmark must use 8 GPUs as the default setting from MoCo.
- Please report the mean of 5 trials in your offical paper, according to MoCo.
- DeepCluster that uses Sobel layer is not supported by detectron2.
2020-06-16 01:52:56 +08:00
### Count number of parameters
```shell
python tools/count_parameters.py ${CONFIG_FILE}
```
2020-06-16 00:05:18 +08:00
### Publish a model
2020-06-17 02:06:16 +08:00
Compute the hash of the weight file and append the hash id to the filename. The output file is the input file name with a hash suffix.
2020-06-16 00:05:18 +08:00
```shell
python tools/publish_model.py ${WEIGHT_FILE}
```
2020-06-17 02:06:16 +08:00
Arguments:
-`WEIGHT_FILE`: The extracted backbone weights extracted aforementioned.
2020-06-16 00:05:18 +08:00
## How-to
### Use a new dataset
1. Write a data source file under `openselfsup/datasets/data_sources/`. You may refer to the existing ones.
2. Create new config files for your experiments.
### Design your own methods
#### What you need to do
1. Create a dataset file under `openselfsup/datasets/` (better using existing ones);
2. Create a model file under `openselfsup/models/`. The model typically contains:
2020-06-17 02:06:16 +08:00
i) backbone (required): images to deep features from differet depth of layers. Your model must contain a `self.backbone` module, otherwise the backbone weights cannot be extracted.
2020-06-16 00:05:18 +08:00
ii) neck (optional): deep features to compact feature vectors.
iii) head (optional): define loss functions.
iv) memory_bank (optional): define memory banks.
3. Create a config file under `configs/` and setup the configs;
2020-06-17 02:06:16 +08:00
4. [Optional] Create a hook file under `openselfsup/hooks/` if your method requires additional operations before run, every several iterations, every several epoch, or after run.
2020-06-16 00:05:18 +08:00
You may refer to existing modules under respective folders.
2020-06-16 00:57:17 +08:00
#### Features that may facilitate your implementation
2020-06-16 00:05:18 +08:00
* Decoupled data source and dataset.
Since dataset is correlated to a specific task while data source is general, we decouple data source and dataset in OpenSelfSup.
```python
data = dict(
train=dict(type='ContrastiveDataset',
data_source=dict(type='ImageNet', list_file='xx', root='xx'),
pipeline=train_pipeline),
val=dict(...),
2020-06-17 02:06:16 +08:00
...
2020-06-16 00:05:18 +08:00
)
```
* Configure data augmentations in the config file.
2020-06-17 02:06:16 +08:00
The augmentations are the same as `torchvision.transforms` except that `torchvision.transforms.RandomAppy` corresponds to `RandomAppliedTrans`. `Lighting` and `GaussianBlur` is additionally implemented.
2020-06-16 00:05:18 +08:00
```python
2020-06-17 02:06:16 +08:00
img_norm_cfg = dict(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
2020-06-16 00:05:18 +08:00
train_pipeline = [
dict(type='RandomResizedCrop', size=224),
dict(type='RandomAppliedTrans',
transforms=[
dict(type='GaussianBlur', sigma_min=0.1, sigma_max=2.0, kernel_size=23)],
p=0.5),
dict(type='ToTensor'),
dict(type='Normalize', **img_norm_cfg)
]
```
* Parameter-wise optimization parameters.
You may specify optimization paramters including lr, momentum and weight_decay for a certain group of paramters in the config file with `paramwise_options`. `paramwise_options` is a dict whose key is regular expressions and value is options. Options include 6 fields: lr, lr_mult, momentum, momentum_mult, weight_decay, weight_decay_mult.
```python
2020-06-17 02:06:16 +08:00
# this config sets all normalization layers with weight_decay_mult=0.1,
# and the head with `lr_mult=10, momentum=0`.
2020-06-16 00:05:18 +08:00
paramwise_options = {
'(bn|gn)(\d+)?.(weight|bias)': dict(weight_decay_mult=0.1),
'\Ahead.': dict(lr_mult=10, momentum=0)}
optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
weight_decay=0.0001,
paramwise_options=paramwise_options)
```
* Configure custom hooks in the config file.
The hooks will be called in order. For hook design, please refer to [odc_hook.py](openselfsup/hooks/odc_hook.py) as an example.
```python
custom_hooks = [
2020-06-17 02:06:16 +08:00
dict(type='DeepClusterHook', ...),
dict(type='ODCHook', ...),
2020-06-16 00:05:18 +08:00
]
```