You can use `tools/train.py` to train a model on a single machine with a CPU and optionally a GPU.
Here is the full usage of the script:
```shell
python tools/train.py ${CONFIG_FILE} [ARGS]
```
````{note}
By default, MMClassification prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
| `--work-dir WORK_DIR` | The target folder to save logs and checkpoints. Defaults to a folder with the same name of the config file under `./work_dirs`. |
| `--resume [RESUME]` | Resume training. If specify a path, resume from it, while if not specify, try to auto resume from the latest checkpoint. |
| `--auto-scale-lr` | Auto scale the learning rate according to the actual batch size and the original batch size. |
| `--cfg-options CFG_OPTIONS` | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
### Training with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
By default, MMClassification prefers GPU to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmclassification.readthedocs.io/en/1.x/modelzoo_statistics.html)). |
| `--work-dir WORK_DIR` | The directory to save the file containing evaluation metrics. |
| `--out OUT` | The path to save the file containing test results. |
| `--out-item OUT_ITEM` | To specify the content of the test results file, and it can be "pred" or "metrics". If "pred", save the outputs of the model for offline evaluation. If "metrics", save the evaluation metrics. Defaults to "pred". |
| `--cfg-options CFG_OPTIONS` | Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into the config file. If the value to be overwritten is a list, it should be of the form of either `key="[a,b]"` or `key=a,b`. The argument also allows nested list/tuple values, e.g. `key="[(a,b),(c,d)]"`. Note that the quotation marks are necessary and that no white space is allowed. |
| `--show-dir SHOW_DIR` | The directory to save the result visualization images. |
| `--show` | Visualize the prediction result in a window. |
| `--interval INTERVAL` | The interval of samples to visualize. |
| `--wait-time WAIT_TIME` | The display time of every window (in seconds). Defaults to 1. |
| `--launcher {none,pytorch,slurm,mpi}` | Options for job launcher. |
### Test with multiple GPUs
We provide a shell script to start a multi-GPUs task with `torch.distributed.launch`.
| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmclassification.readthedocs.io/en/1.x/modelzoo_statistics.html)). |
| `GPU_NUM` | The number of GPUs to be used. |
| `[PY_ARGS]` | The other optional arguments of `tools/test.py`, see [here](#test-with-your-pc). |
You can also specify extra arguments of the launcher by environment variables. For example, change the
communication port of the launcher to 29666 by the below command:
| `PARTITION` | The partition to use in your cluster. |
| `JOB_NAME` | The name of your job, you can name it as you like. |
| `CONFIG_FILE` | The path to the config file. |
| `CHECKPOINT_FILE` | The path to the checkpoint file (It can be a http link, and you can find checkpoints [here](https://mmclassification.readthedocs.io/en/1.x/modelzoo_statistics.html)). |
| `[PY_ARGS]` | The other optional arguments of `tools/test.py`, see [here](#test-with-your-pc). |
Here are the environment variables can be used to configure the slurm job.