support MoCo 8GPU linear_cls
parent
79fdb8ce54
commit
ebb4ae307a
|
@ -5,8 +5,8 @@ set -x
|
||||||
|
|
||||||
CFG=$1 # use cfgs under "configs/benchmarks/linear_classification/"
|
CFG=$1 # use cfgs under "configs/benchmarks/linear_classification/"
|
||||||
PRETRAIN=$2
|
PRETRAIN=$2
|
||||||
GPUS=$3 # in MoCo, GPUS=8
|
PY_ARGS=${@:3} # --resume_from --deterministic
|
||||||
PY_ARGS=${@:4} # --resume_from --deterministic
|
GPUS=8 # When changing GPUS, please also change imgs_per_batch in the config file accordingly to ensure the total batch size is 256.
|
||||||
PORT=${PORT:-29500}
|
PORT=${PORT:-29500}
|
||||||
|
|
||||||
if [ "$CFG" == "" ] || [ "$PRETRAIN" == "" ]; then
|
if [ "$CFG" == "" ] || [ "$PRETRAIN" == "" ]; then
|
||||||
|
|
|
@ -8,8 +8,8 @@ CFG=$2
|
||||||
PRETRAIN=$3
|
PRETRAIN=$3
|
||||||
PY_ARGS=${@:4}
|
PY_ARGS=${@:4}
|
||||||
JOB_NAME="openselfsup"
|
JOB_NAME="openselfsup"
|
||||||
GPUS=1 # in the standard setting, GPUS=1
|
GPUS=8 # When changing GPUS, please also change imgs_per_batch in the config file accordingly to ensure the total batch size is 256.
|
||||||
GPUS_PER_NODE=${GPUS_PER_NODE:-1}
|
GPUS_PER_NODE=${GPUS_PER_NODE:-8}
|
||||||
CPUS_PER_TASK=${CPUS_PER_TASK:-5}
|
CPUS_PER_TASK=${CPUS_PER_TASK:-5}
|
||||||
SRUN_ARGS=${SRUN_ARGS:-""}
|
SRUN_ARGS=${SRUN_ARGS:-""}
|
||||||
|
|
||||||
|
|
|
@ -5,7 +5,8 @@ For installation instructions, please see [INSTALL.md](INSTALL.md).
|
||||||
|
|
||||||
## Train existing methods
|
## Train existing methods
|
||||||
|
|
||||||
**Note**: The default learning rate in config files is for 8 GPUs (except for those under `configs/benchmarks/linear_classification` that use 1 GPU). If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
|
**Note**: The default learning rate in config files is for 8 GPUs.
|
||||||
|
If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
|
||||||
|
|
||||||
### Train with single/multiple GPUs
|
### Train with single/multiple GPUs
|
||||||
|
|
||||||
|
@ -101,13 +102,12 @@ Arguments:
|
||||||
**Next**, train and test linear classification:
|
**Next**, train and test linear classification:
|
||||||
```shell
|
```shell
|
||||||
# train
|
# train
|
||||||
bash benchmarks/dist_train_linear.sh ${CONFIG_FILE} ${WEIGHT_FILE} ${GPUS} [optional arguments]
|
bash benchmarks/dist_train_linear.sh ${CONFIG_FILE} ${WEIGHT_FILE} [optional arguments]
|
||||||
# test (unnecessary if have validation in training)
|
# test (unnecessary if have validation in training)
|
||||||
bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${CHECKPOINT}
|
bash tools/dist_test.sh ${CONFIG_FILE} ${GPUS} ${CHECKPOINT}
|
||||||
```
|
```
|
||||||
Augments:
|
Augments:
|
||||||
- `CONFIG_FILE`: Use config files under "configs/benchmarks/linear_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/benchmarks/linear_classification/imagenet/r50_multihead_sobel.py`.
|
- `CONFIG_FILE`: Use config files under "configs/benchmarks/linear_classification/". Note that if you want to test DeepCluster that has a sobel layer before the backbone, you have to use the config file named `*_sobel.py`, e.g., `configs/benchmarks/linear_classification/imagenet/r50_multihead_sobel.py`.
|
||||||
- `GPUS`: `8` for MoCo and `1` for other methods.
|
|
||||||
- Optional arguments include:
|
- Optional arguments include:
|
||||||
- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
|
- `--resume_from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file.
|
||||||
- `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
|
- `--deterministic`: Switch on "deterministic" mode which slows down training but the results are reproducible.
|
||||||
|
|
Loading…
Reference in New Issue