mirror of https://github.com/open-mmlab/mmpretrain.git synced 2025-06-03 14:59:18 +08:00

wangjiangben-hw 578c035d5c

[Enhance] Add dist_train_arm.sh for ARM device and update NPU results. (#1218 )

* update npu results

* add dist_train_arm.sh & updata docs

* del content

2022-11-28 11:12:19 +08:00

6.3 KiB

Raw Permalink Blame History

NPU (HUAWEI Ascend)

Usage

General Usage

Please install MMCV with NPU device support according to {external+mmcv:doc}the tutorial <get_started/build>.

Here we use 8 NPUs on your computer to train the model with the following command:

bash ./tools/dist_train.sh configs/resnet/resnet50_8xb32_in1k.py 8 --device npu

Also, you can use only one NPU to train the model with the following command:

python ./tools/train.py configs/resnet/resnet50_8xb32_in1k.py --device npu

High-performance Usage on ARM server

Since the scheduling ability of ARM CPUs when processing resource preemption is not as good as that of X86 CPUs during multi-card training, we provide a high-performance startup script to accelerate training with the following command:

# The script under the 8 cards of a single machine is shown here
bash tools/dist_train_arm.sh configs/resnet/resnet50_8xb32_in1k.py 8 --device npu --cfg-options data.workers_per_gpu=$(($(nproc)/8))

For resnet50 8 NPUs training with batch_size(data.samples_per_gpu)=512, the performance data is shown below:

CPU	Start Script	IterTime(s)
ARM(Kunpeng920 *4)	./tools/dist_train.sh	~0.9(0.85-1.0)
ARM(Kunpeng920 *4)	./tools/dist_train_arm.sh	~0.8(0.78s-0.85)

Models Results

Model	Top-1 (%)	Top-5 (%)	Config	Download
ResNet-50	76.38	93.22	config	model \| log
ResNetXt-32x4d-50	77.55	93.75	config	model \| log
HRNet-W18	77.01	93.46	config	model \| log
ResNetV1D-152	79.11	94.54	config	model \| log
SE-ResNet-50	77.64	93.76	config	model \| log
VGG-11	68.92	88.83	config	model \| log
ShuffleNetV2 1.0x	69.53	88.82	config	model \| log
MobileNetV2	71.758	90.394	config	model \| log
MobileNetV3-Small	67.522	87.316	config	model \| log
*CSPResNeXt50	77.10	93.55	config	model \| log
*EfficientNet-B4(AA + AdvProp)	75.55	92.86	config	model \| log
**DenseNet121	72.62	91.04	config	model \| log

Notes:

If not specially marked, the results are almost same between results on the NPU and results on the GPU with FP32.
(*) The training results of these models are lower than the results on the readme in the corresponding model, mainly because the results on the readme are directly the weight of the timm of the eval, and the results on this side are retrained according to the config with mmcls. The results of the config training on the GPU are consistent with the results of the NPU.
(**) The accuracy of this model is slightly lower because config is a 4-card config, we use 8 cards to run, and users can adjust hyperparameters to get the best accuracy results.

All above models are provided by Huawei Ascend group.

6.3 KiB Raw Permalink Blame History