[Feature] support cpu training (#188)
* [Fix] modify non-dist training algorithm list * [Feature] support cpu training * [Docs] modify descriptionpull/193/head
parent
c39bb83a7c
commit
a807b38e4c
|
@ -2,6 +2,7 @@
|
|||
|
||||
- [Getting Started](#getting-started)
|
||||
- [Train existing methods](#train-existing-methods)
|
||||
- [Training with CPU](#training-with-cpu)
|
||||
- [Train with single/multiple GPUs](#train-with-singlemultiple-gpus)
|
||||
- [Train with multiple machines](#train-with-multiple-machines)
|
||||
- [Launch multiple jobs on a single machine](#launch-multiple-jobs-on-a-single-machine)
|
||||
|
@ -18,6 +19,15 @@ This page provides basic tutorials about the usage of MMSelfSup. For installatio
|
|||
|
||||
**Note**: The default learning rate in config files is for 8 GPUs. If using different number GPUs, the total batch size will change in proportion, you have to scale the learning rate following `new_lr = old_lr * new_ngpus / old_ngpus`. We recommend to use `tools/dist_train.sh` even with 1 gpu, since some methods do not support non-distributed training.
|
||||
|
||||
### Training with CPU
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=-1
|
||||
python tools/train.py ${CONFIG_FILE}
|
||||
```
|
||||
|
||||
**Note**: We do not recommend users to use CPU for training because it is too slow and some algorithms are using `SyncBN` which is based on distributed training. We support this feature to allow users to debug on machines without GPU for convenience.
|
||||
|
||||
### Train with single/multiple GPUs
|
||||
|
||||
```shell
|
||||
|
|
|
@ -2,6 +2,7 @@
|
|||
|
||||
- [基础教程](#基础教程)
|
||||
- [训练已有的算法](#训练已有的算法)
|
||||
- [使用 CPU 训练](#使用-cpu-训练)
|
||||
- [使用 单张/多张 显卡训练](#使用-单张多张-显卡训练)
|
||||
- [使用多台机器训练](#使用多台机器训练)
|
||||
- [在一台机器上启动多个任务](#在一台机器上启动多个任务)
|
||||
|
@ -18,6 +19,15 @@
|
|||
|
||||
**注意**: 当您启动一个任务的时候,默认会使用8块显卡. 如果您想使用少于或多余8块显卡, 那么你的 batch size 也会同比例缩放,同时您的学习率服从一个线性缩放原则, 那么您可以使用以下公式来调整您的学习率: `new_lr = old_lr * new_ngpus / old_ngpus`. 除此之外,我们推荐您使用 `tools/dist_train.sh` 来启动训练任务,即便您只使用一块显卡, 因为 MMSelfSup 中有些算法不支持非分布式训练。
|
||||
|
||||
### 使用 CPU 训练
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=-1
|
||||
python tools/train.py ${CONFIG_FILE}
|
||||
```
|
||||
|
||||
**注意**: 我们不推荐用户使用 CPU 进行训练, 因为 CPU 的训练速度很慢,一些算法仅支持分布式训练, 例如 `SyncBN`,该方法需要分布式进行训练,我们支持这个功能是为了方便用户在没有 GPU 的机器上进行调试。
|
||||
|
||||
### 使用 单张/多张 显卡训练
|
||||
|
||||
```shell
|
||||
|
|
|
@ -103,8 +103,8 @@ def train_model(model,
|
|||
broadcast_buffers=False,
|
||||
find_unused_parameters=find_unused_parameters)
|
||||
else:
|
||||
model = MMDataParallel(
|
||||
model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
|
||||
model = MMDataParallel(model, device_ids=cfg.gpu_ids)
|
||||
|
||||
# build optimizer
|
||||
optimizer = build_optimizer(model, cfg.optimizer)
|
||||
|
||||
|
|
|
@ -113,7 +113,8 @@ def main():
|
|||
if args.launcher == 'none':
|
||||
distributed = False
|
||||
assert cfg.model.type not in [
|
||||
'DeepCluster', 'MoCo', 'SimCLR', 'ODC', 'NPID', 'DenseCL'
|
||||
'DeepCluster', 'MoCo', 'SimCLR', 'ODC', 'NPID', 'SimSiam',
|
||||
'DenseCL', 'BYOL'
|
||||
], f'{cfg.model.type} does not support non-dist training.'
|
||||
else:
|
||||
distributed = True
|
||||
|
|
Loading…
Reference in New Issue