support MoCo 8GPU linear_cls

2020-07-19 17:52:47 +08:00 · 2020-07-19 17:52:47 +08:00 · ebb4ae307a
parent 79fdb8ce54
commit ebb4ae307a
3 changed files with 7 additions and 7 deletions
--- a/benchmarks/dist_train_linear.sh
+++ b/benchmarks/dist_train_linear.sh
@ -5,8 +5,8 @@ set -x

 CFG=$1 # use cfgs under "configs/benchmarks/linear_classification/"
 PRETRAIN=$2
-GPUS=$3 # in MoCo, GPUS=8
-PY_ARGS=${@:4} # --resume_from --deterministic
+PY_ARGS=${@:3} # --resume_from --deterministic
+GPUS=8 # When changing GPUS, please also change imgs_per_batch in the config file accordingly to ensure the total batch size is 256.
 PORT=${PORT:-29500}

 if [ "$CFG" == "" ] || [ "$PRETRAIN" == "" ]; then
--- a/benchmarks/srun_train_linear.sh
+++ b/benchmarks/srun_train_linear.sh
@ -8,8 +8,8 @@ CFG=$2
 PRETRAIN=$3
 PY_ARGS=${@:4}
 JOB_NAME="openselfsup"
-GPUS=1 # in the standard setting, GPUS=1
-GPUS_PER_NODE=${GPUS_PER_NODE:-1}
+GPUS=8 # When changing GPUS, please also change imgs_per_batch in the config file accordingly to ensure the total batch size is 256.
+GPUS_PER_NODE=${GPUS_PER_NODE:-8}
 CPUS_PER_TASK=${CPUS_PER_TASK:-5}
 SRUN_ARGS=${SRUN_ARGS:-""}

--- a/docs/GETTING_STARTED.md
+++ b/docs/GETTING_STARTED.md
@ -5,7 +5,8 @@ For installation instructions, please see [INSTALL.md](INSTALL.md).

 ## Train existing methods

-**Note**: The default learning rate in config files is for 8 GPUs (except for those under `configs/benchmarks/linear_classification` that use 1 GPU). If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
+**Note**: The default learning rate in config files is for 8 GPUs.
+If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.

 ### Train with single/multiple GPUs

@ -101,13 +102,12 @@ Arguments:
 **Next**, train and test linear classification:
 ```shell
 # train
-bash benchmarks/dist_train_linear.sh ${CONFIG_FILE} ${WEIGHT_FILE} ${GPUS} [optional arguments]
+bash benchmarks/dist_train_linear.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
 # test (unnecessary if have validation in training)
 bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${CHECKPOINT}
 ```
 Augments:
 - `CONFIG_FILE`: Use config files under "configs/benchmarks/linear_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/benchmarks/linear_classification/imagenet/r50_multihead_sobel.py`.
- `GPUS`: `8` for MoCo and `1` for other methods.
 - Optional arguments include:
    - `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
    - `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.