This tutorial provides instruction for users to use the models provided in the [Model Zoo](../model_zoo.md) for other datasets to obtain better performance.
MMSegmentation also provides out-of-the-box tools for training models.
This section will show how to train and test models on standard datasets.
load_from = 'https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth' # model path can be found in model zoo
-`--resume ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file. If not specify, try to auto resume from the latest checkpoint in the work directory.
-`--cfg-options ${OVERRIDE_CONFIGS}`: Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into config file.
For example, '--cfg-option model.encoder.in_channels=6'. Please see this [guide](./1_config.md#Modify-config-through-script-arguments) for more details.
Below is the optional arguments for multi-gpu test:
-`--launcher`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
-`--local_rank`: ID for local rank. If not specified, it will be set to 0.
The process of training on the CPU is consistent with single GPU training if machine does not have GPU. If it has GPUs but not wanting to use it, we just need to disable GPUs before the training process.
**Note**: During training, checkpoints and logs are saved in the same folder structure as the config file under `work_dirs/`. Custom work directory is not recommended since evaluation scripts infer work directories from the config file name. If you want to save your weights somewhere else, please use symlink, for example:
```shell
ln -s ${YOUR_WORK_DIRS} ${MMSEG}/work_dirs
```
#### Launch multiple jobs on a single machine
If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, you need to specify different ports (29500 by default) for each job to avoid communication conflict. Otherwise, there will be error message saying `RuntimeError: Address already in use`.
If you use `dist_train.sh` to launch training jobs, you can set the port in commands with environment variable `PORT`.
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 sh tools/dist_train.sh ${CONFIG_FILE} 4
CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 sh tools/dist_train.sh ${CONFIG_FILE} 4
MMSegmentation relies on `torch.distributed` package for distributed training.
Thus, as a basic usage, one can launch distributed training via PyTorch's [launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility).
`tools/dist_test.sh` also supports multi-node testing, but relies on PyTorch's [launch utility](https://pytorch.org/docs/stable/distributed.html#launch-utility).
[Slurm](https://slurm.schedmd.com/) is a good job scheduling system for computing clusters.
On a cluster managed by Slurm, you can use `slurm_test.sh` to spawn testing jobs. It supports both single-node and multi-node testing.
-`--work-dir`: If specified, results will be saved in this directory. If not specified, the results will be automatically saved to `work_dirs/{CONFIG_NAME}`.
-`--show`: Show prediction results at runtime, available when `--show-dir` is not specified.
-`--show-dir`: If specified, the visualized segmentation mask will be saved in the specified directory.
-`--wait-time`: The interval of show (s), which takes effect when `--show` is activated. Default to 2.
-`--cfg-options`: If specified, the key-value pair in xxx=yyy format will be merged into config file.
For example: To trade speed with GPU memory, you may pass in `--cfg-options model.backbone.with_cp=True` to enable checkpoint in backbone.
Below is the optional arguments for multi-gpu test:
-`--launcher`: Items for distributed job initialization launcher. Allowed choices are `none`, `pytorch`, `slurm`, `mpi`. Especially, if set to none, it will test in a non-distributed mode.
-`--local_rank`: ID for local rank. If not specified, it will be set to 0.
Examples:
Assume that you have already downloaded the checkpoints to the directory `checkpoints/`.
1. Test PSPNet on PASCAL VOC (without saving the test results) and evaluate the mIoU.
Since `--work-dir` is not specified, the folder `work_dirs/pspnet_r50-d8_4xb4-20k_voc12aug-512x512` will be created automatically to save the evaluation results.