- [Train with multiple GPUs](#train-with-multiple-gpus)
- [Train with multiple machines](#train-with-multiple-machines)
- [Multiple machines in the same network](#multiple-machines-in-the-same-network)
- [Multiple machines managed with slurm](#multiple-machines-managed-with-slurm)
## Train with your PC
You can use `tools/train.py` to train a model on a single machine with a CPU and optionally a GPU.
Here is the full usage of the script:
```shell
python tools/train.py ${CONFIG_FILE} [ARGS]
```
````{note}
By default, MMPretrain prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
| `--work-dir WORK_DIR` | The target folder to save logs and checkpoints. Defaults to a folder with the same name of the config file under `./work_dirs`. |
| `--resume [RESUME]` | Resume training. If specify a path, resume from it, while if not specify, try to auto resume from the latest checkpoint. |
| `--auto-scale-lr` | Auto scale the learning rate according to the actual batch size and the original batch size. |
| `--no-pin-memory` | Whether to disable the pin_memory option in dataloaders. |
| `--no-persistent-workers` | Whether to disable the persistent_workers option in dataloaders. |
| `--cfg-options CFG_OPTIONS` | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
## Train with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.