mmrazor/tools/mmseg/dist_train.sh

20 lines
457 B
Bash
Raw Normal View History

#!/usr/bin/env bash
CONFIG=$1
GPUS=$2
Bump version to v0.3.0 (#135) * [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Refactor] Delete redundant `set_random_seed` function (#104) * refactor set_random_seed * add unittests * fix unittests error * fix lint * avoid bc breaking * [Feature] Add diff seeds to diff ranks and set torch seed in worker_init_fn (#113) * add init_random_seed * Set diff seed to diff workers * [Feature] Add multi machine dist_train (#114) * support multi nodes * update training doc * fix lints * remove fixed seed * fix ddp wrapper registry (#128) * [Docs] Add brief installation steps in README(_zh-CN).md (#121) * Add brief installation * add brief installtion ref to mmediting pr#816 Co-authored-by: caoweihan <caoweihan@sensetime.com> * [BUG]Fix bugs in pruner (#126) * fix bugs in pruner when pruning models with shared modules * pruner can trace models with dilation conv2d * fix deploy_subnet * fix add_pruning_attrs * fix bugs in modify_forward * fix lint * fix StructurePruner * test tracing models with shared modules Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Docs]Add some more details to docs (#133) * add docs for dataset * add cfg-options for distillation * fix docs Co-authored-by: caoweihan <caoweihan@sensetime.com> * reset norm running status after prepare_from_supernet (#81) * [Improvement]Sync train api (#115) Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Feature]Support Relational Knowledge Distillation (#127) * add rkd * add rkd pytest * add rkd configs * fix readme * fix rkd * split rkd loss to distance-wise and angle-wise losses * rename rkd losses * add rkd metaflie * add rkd related links * rename rkd metafile and add to model index * delete cifar100 Co-authored-by: caoweihan <caoweihan@sensetime.com> Co-authored-by: pppppM <gjf_mail@126.com> Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com> Co-authored-by: wutongshenqiu <690364065@qq.com> Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com> Co-authored-by: caoweihan <caoweihan@sensetime.com>
2022-04-02 19:30:50 +08:00
NNODES=${NNODES:-1}
NODE_RANK=${NODE_RANK:-0}
PORT=${PORT:-29500}
Bump version to v0.3.0 (#135) * [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Refactor] Delete redundant `set_random_seed` function (#104) * refactor set_random_seed * add unittests * fix unittests error * fix lint * avoid bc breaking * [Feature] Add diff seeds to diff ranks and set torch seed in worker_init_fn (#113) * add init_random_seed * Set diff seed to diff workers * [Feature] Add multi machine dist_train (#114) * support multi nodes * update training doc * fix lints * remove fixed seed * fix ddp wrapper registry (#128) * [Docs] Add brief installation steps in README(_zh-CN).md (#121) * Add brief installation * add brief installtion ref to mmediting pr#816 Co-authored-by: caoweihan <caoweihan@sensetime.com> * [BUG]Fix bugs in pruner (#126) * fix bugs in pruner when pruning models with shared modules * pruner can trace models with dilation conv2d * fix deploy_subnet * fix add_pruning_attrs * fix bugs in modify_forward * fix lint * fix StructurePruner * test tracing models with shared modules Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Docs]Add some more details to docs (#133) * add docs for dataset * add cfg-options for distillation * fix docs Co-authored-by: caoweihan <caoweihan@sensetime.com> * reset norm running status after prepare_from_supernet (#81) * [Improvement]Sync train api (#115) Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Feature]Support Relational Knowledge Distillation (#127) * add rkd * add rkd pytest * add rkd configs * fix readme * fix rkd * split rkd loss to distance-wise and angle-wise losses * rename rkd losses * add rkd metaflie * add rkd related links * rename rkd metafile and add to model index * delete cifar100 Co-authored-by: caoweihan <caoweihan@sensetime.com> Co-authored-by: pppppM <gjf_mail@126.com> Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com> Co-authored-by: wutongshenqiu <690364065@qq.com> Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com> Co-authored-by: caoweihan <caoweihan@sensetime.com>
2022-04-02 19:30:50 +08:00
MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
PYTHONPATH="$(dirname $0)/../..":$PYTHONPATH \
Bump version to v0.3.0 (#135) * [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Refactor] Delete redundant `set_random_seed` function (#104) * refactor set_random_seed * add unittests * fix unittests error * fix lint * avoid bc breaking * [Feature] Add diff seeds to diff ranks and set torch seed in worker_init_fn (#113) * add init_random_seed * Set diff seed to diff workers * [Feature] Add multi machine dist_train (#114) * support multi nodes * update training doc * fix lints * remove fixed seed * fix ddp wrapper registry (#128) * [Docs] Add brief installation steps in README(_zh-CN).md (#121) * Add brief installation * add brief installtion ref to mmediting pr#816 Co-authored-by: caoweihan <caoweihan@sensetime.com> * [BUG]Fix bugs in pruner (#126) * fix bugs in pruner when pruning models with shared modules * pruner can trace models with dilation conv2d * fix deploy_subnet * fix add_pruning_attrs * fix bugs in modify_forward * fix lint * fix StructurePruner * test tracing models with shared modules Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Docs]Add some more details to docs (#133) * add docs for dataset * add cfg-options for distillation * fix docs Co-authored-by: caoweihan <caoweihan@sensetime.com> * reset norm running status after prepare_from_supernet (#81) * [Improvement]Sync train api (#115) Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Feature]Support Relational Knowledge Distillation (#127) * add rkd * add rkd pytest * add rkd configs * fix readme * fix rkd * split rkd loss to distance-wise and angle-wise losses * rename rkd losses * add rkd metaflie * add rkd related links * rename rkd metafile and add to model index * delete cifar100 Co-authored-by: caoweihan <caoweihan@sensetime.com> Co-authored-by: pppppM <gjf_mail@126.com> Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com> Co-authored-by: wutongshenqiu <690364065@qq.com> Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com> Co-authored-by: caoweihan <caoweihan@sensetime.com>
2022-04-02 19:30:50 +08:00
python -m torch.distributed.launch \
--nnodes=$NNODES \
--node_rank=$NODE_RANK \
--master_addr=$MASTER_ADDR \
--nproc_per_node=$GPUS \
--master_port=$PORT \
$(dirname "$0")/mmseg/train_mmseg.py \
$CONFIG \
--launcher pytorch ${@:3}