From 02571fe4b849f23e64f57c6bc04b318a1c245820 Mon Sep 17 00:00:00 2001
From: Yixiao Fang <36138628+fangyixiao18@users.noreply.github.com>
Date: Fri, 14 Apr 2023 13:58:10 +0800
Subject: [PATCH] [Docs] Add NPU support page (#1481)

* add npu docs

* fix lint
---
 docs/en/device/npu.md    | 47 ++++++++++++++++++++++++++++++++++++++++
 docs/en/index.rst        |  6 +++++
 docs/zh_CN/device/npu.md | 41 +++++++++++++++++++++++++++++++++++
 docs/zh_CN/index.rst     |  6 +++++
 4 files changed, 100 insertions(+)
 create mode 100644 docs/en/device/npu.md
 create mode 100644 docs/zh_CN/device/npu.md

diff --git a/docs/en/device/npu.md b/docs/en/device/npu.md
new file mode 100644
index 00000000..5503b7e5
--- /dev/null
+++ b/docs/en/device/npu.md
@@ -0,0 +1,47 @@
+# NPU (HUAWEI Ascend)
+
+## Usage
+
+### General Usage
+
+Please refer to the [building documentation of MMCV](https://mmcv.readthedocs.io/en/latest/get_started/build.html#build-mmcv-full-on-ascend-npu-machine) to install MMCV and [MMEngine](https://mmengine.readthedocs.io/en/latest/get_started/installation.html#build-from-source) on NPU devices.
+
+Here we use 8 NPUs on your computer to train the model with the following command:
+
+```shell
+bash ./tools/dist_train.sh configs/resnet/resnet50_8xb32_in1k.py 8
+```
+
+Also, you can use only one NPU to train the model with the following command:
+
+```shell
+python ./tools/train.py configs/resnet/resnet50_8xb32_in1k.py
+```
+
+## Models Results
+
+|                            Model                            | Top-1 (%) | Top-5 (%) |                            Config                            |                            Download                             |
+| :---------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------: | :-------------------------------------------------------------: |
+| [ResNet-50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/README.md) |   76.40   |   93.21   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnet50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/resnet50_8xb32_in1k.log) |
+| [ResNetXt-32x4d-50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnext/README.md) |   77.48   |   93.75   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnext/resnext50-32x4d_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/resnext50-32x4d_8xb32_in1k.log) |
+| [HRNet-W18](https://github.com/open-mmlab/mmclassification/blob/master/configs/hrnet/README.md) |   77.06   |   93.57   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/hrnet/hrnet-w18_4xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/hrnet-w18_4xb32_in1k.log) |
+| [ResNetV1D-152](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/README.md) |   79.41   |   94.48   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnetv1d152_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/resnetv1d152_8xb32_in1k.log) |
+| [SE-ResNet-50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/seresnet/README.md) |   77.65   |   93.74   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/seresnet/seresnet50_8xb32_in1k.py) | [model](<>) \|[log](https://download.openmmlab.com/mmclassification/v1/device/npu/seresnet50_8xb32_in1k.log) |
+| [ShuffleNetV2 1.0x](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/shufflenet_v2/README.md) |   69.52   |   88.79   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/shufflenet-v2-1x_16xb64_in1k.log) |
+| [MobileNetV2](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/mobilenet_v2) |   71.74   |   90.28   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/mobilenet-v2_8xb32_in1k.log) |
+| [MobileNetV3-Small](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/mobilenet_v3/README.md) |   67.09   |   87.17   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/mobilenet_v3/mobilenet-v3-small_8xb128_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/mobilenet-v3-small.log) |
+| [\*CSPResNeXt50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/cspnet/README.md) |   77.25   |   93.46   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/cspnet/cspresnext50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/cspresnext50_8xb32_in1k.log) |
+| [\*EfficientNet-B4](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/efficientnet/README.md) |   75.73   |   92.91   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/efficientnet/efficientnet-b4_8xb32_in1k.py) | [model](<>) \|[log](https://download.openmmlab.com/mmclassification/v1/device/npu/efficientnet-b4_8xb32_in1k.log) |
+| [\*\*DenseNet121](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/densenet/README.md) |   72.53   |   90.85   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/densenet/densenet121_4xb256_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/densenet121_4xb256_in1k.log) |
+
+**Notes:**
+
+- If not specially marked, the results are almost same between results on the NPU and results on the GPU with FP32.
+- (\*) The training results of these models are lower than the results on the readme in the corresponding model, mainly
+  because the results on the readme are directly the weight of the timm of the eval, and the results on this side are
+  retrained according to the config with mmcls. The results of the config training on the GPU are consistent with the
+  results of the NPU.
+- (\*\*) The accuracy of this model is slightly lower because config is a 4-card config, we use 8 cards to run, and users
+  can adjust hyperparameters to get the best accuracy results.
+
+**All above models are provided by Huawei Ascend group.**
diff --git a/docs/en/index.rst b/docs/en/index.rst
index b8a00932..a3d2a851 100644
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -141,6 +141,12 @@ We always welcome *PRs* and *Issues* for the betterment of MMPretrain.
    notes/pretrain_custom_dataset.md
    notes/finetune_custom_dataset.md
 
+.. toctree::
+   :maxdepth: 1
+   :caption: Device Support
+
+   device/npu.md
+
 Indices and tables
 ==================
 
diff --git a/docs/zh_CN/device/npu.md b/docs/zh_CN/device/npu.md
new file mode 100644
index 00000000..b81c1751
--- /dev/null
+++ b/docs/zh_CN/device/npu.md
@@ -0,0 +1,41 @@
+# NPU (华为昇腾)
+
+## 使用方法
+
+首先，请参考[链接](https://mmcv.readthedocs.io/zh_CN/latest/get_started/build.html#npu-mmcv-full)安装带有 NPU 支持的 MMCV 和[链接](https://mmengine.readthedocs.io/en/latest/get_started/installation.html#build-from-source)安装 MMEngine。
+
+使用如下命令，可以利用 8 个 NPU 在机器上训练模型（以 ResNet 为例）：
+
+```shell
+bash tools/dist_train.sh configs/cspnet/resnet50_8xb32_in1k.py 8
+```
+
+或者，使用如下命令，在一个 NPU 上训练模型（以 ResNet 为例）：
+
+```shell
+python tools/train.py configs/cspnet/resnet50_8xb32_in1k.py
+```
+
+## 经过验证的模型
+
+|                            Model                            | Top-1 (%) | Top-5 (%) |                            Config                            |                            Download                             |
+| :---------------------------------------------------------: | :-------: | :-------: | :----------------------------------------------------------: | :-------------------------------------------------------------: |
+| [ResNet-50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/README.md) |   76.40   |   93.21   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnet50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/resnet50_8xb32_in1k.log) |
+| [ResNetXt-32x4d-50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnext/README.md) |   77.48   |   93.75   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnext/resnext50-32x4d_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/resnext50-32x4d_8xb32_in1k.log) |
+| [HRNet-W18](https://github.com/open-mmlab/mmclassification/blob/master/configs/hrnet/README.md) |   77.06   |   93.57   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/hrnet/hrnet-w18_4xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/hrnet-w18_4xb32_in1k.log) |
+| [ResNetV1D-152](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/README.md) |   79.41   |   94.48   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/resnet/resnetv1d152_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/resnetv1d152_8xb32_in1k.log) |
+| [SE-ResNet-50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/seresnet/README.md) |   77.65   |   93.74   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/seresnet/seresnet50_8xb32_in1k.py) | [model](<>) \|[log](https://download.openmmlab.com/mmclassification/v1/device/npu/seresnet50_8xb32_in1k.log) |
+| [ShuffleNetV2 1.0x](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/shufflenet_v2/README.md) |   69.52   |   88.79   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/shufflenet-v2-1x_16xb64_in1k.log) |
+| [MobileNetV2](https://github.com/open-mmlab/mmclassification/tree/1.x/configs/mobilenet_v2) |   71.74   |   90.28   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/mobilenet_v2/mobilenet-v2_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/mobilenet-v2_8xb32_in1k.log) |
+| [MobileNetV3-Small](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/mobilenet_v3/README.md) |   67.09   |   87.17   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/mobilenet_v3/mobilenet-v3-small_8xb128_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/mobilenet-v3-small.log) |
+| [\*CSPResNeXt50](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/cspnet/README.md) |   77.25   |   93.46   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/cspnet/cspresnext50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/cspresnext50_8xb32_in1k.log) |
+| [\*EfficientNet-B4](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/efficientnet/README.md) |   75.73   |  92.9100  | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/efficientnet/efficientnet-b4_8xb32_in1k.py) | [model](<>) \|[log](https://download.openmmlab.com/mmclassification/v1/device/npu/efficientnet-b4_8xb32_in1k.log) |
+| [\*\*DenseNet121](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/densenet/README.md) |   72.53   |   90.85   | [config](https://github.com/open-mmlab/mmclassification/blob/1.x/configs/densenet/densenet121_4xb256_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v1/device/npu/densenet121_4xb256_in1k.log) |
+
+**注意:**
+
+- 如果没有特别标记，NPU 上的结果与使用 FP32 的 GPU 上的结果结果相同。
+- (\*) 这些模型的训练结果低于相应模型中自述文件上的结果，主要是因为自述文件上的结果直接是 timm 训练得出的权重，而这边的结果是根据 mmcls 的配置重新训练得到的结果。GPU 上的配置训练结果与 NPU 的结果相同。
+- (\*\*)这个模型的精度略低，因为 config 是 4 张卡的配置，我们使用 8 张卡来运行，用户可以调整超参数以获得最佳精度结果。
+
+**以上所有模型权重及训练日志均由华为昇腾团队提供**
diff --git a/docs/zh_CN/index.rst b/docs/zh_CN/index.rst
index cceff12b..7865da8e 100644
--- a/docs/zh_CN/index.rst
+++ b/docs/zh_CN/index.rst
@@ -127,6 +127,12 @@ MMPretrain 上手路线
    notes/pretrain_custom_dataset.md
    notes/finetune_custom_dataset.md
 
+.. toctree::
+   :maxdepth: 1
+   :caption: 设备支持
+
+   device/npu.md
+
 .. toctree::
    :caption: 切换语言