Merge branch 'dev'

2022-11-01 14:19:49 +08:00 · 2022-11-01 14:19:49 +08:00 · 8c63bb55a5
parent a9489f6bd0 29c54dd9ac
commit 8c63bb55a5
21 changed files with 156 additions and 25 deletions
--- a/README.md
+++ b/README.md
@ -33,6 +33,8 @@
 [🆕 Update News](https://mmclassification.readthedocs.io/en/latest/changelog.html) |
 [🤔 Reporting Issues](https://github.com/open-mmlab/mmclassification/issues/new/choose)

+:point_right: **MMClassification 1.0 branch is in trial, welcome every to [try it](https://github.com/open-mmlab/mmclassification/tree/1.x) and [discuss with us](https://github.com/open-mmlab/mmclassification/discussions)!** :point_left:
+
 </div>

 ## Introduction
@ -62,6 +64,11 @@ The MMClassification 1.0 has released! It's still unstable and in release candid
 to [the 1.x branch](https://github.com/open-mmlab/mmclassification/tree/1.x) and discuss it with us in
 [the discussion](https://github.com/open-mmlab/mmclassification/discussions).

+v0.24.1 was released in 31/10/2022.
+Highlights of the new version:
+
+- Support HUAWEI Ascend device.
+
 v0.24.0 was released in 30/9/2022.
 Highlights of the new version:

--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@ -33,6 +33,10 @@
 [🆕 更新日志](https://mmclassification.readthedocs.io/en/latest/changelog.html) |
 [🤔 报告问题](https://github.com/open-mmlab/mmclassification/issues/new/choose)

+:point_right: **MMClassification 1.0 版本即将正式发布，欢迎大家 [试用](https://github.com/open-mmlab/mmclassification/tree/1.x) 并 [参与讨论](https://github.com/open-mmlab/mmclassification/discussions)！** :point_left:
+
+</div>
+
 </div>

 ## Introduction
@ -59,6 +63,10 @@ MMClassification 是一款基于 PyTorch 的开源图像分类工具箱，是 [O

 MMClassification 1.0 已经发布！目前仍在公测中，如果希望试用，请切换到 [1.x 分支](https://github.com/open-mmlab/mmclassification/tree/1.x)，并在[讨论版](https://github.com/open-mmlab/mmclassification/discussions) 参加开发讨论！

+2022/10/31 发布了 v0.24.1 版本
+
+- 支持了华为昇腾 NPU 设备。
+
 2022/9/30 发布了 v0.24.0 版本

 - 支持了 **HorNet**，**EfficientFormerm**，**SwinTransformer V2**，**MViT** 等主干网络。
@ -66,8 +74,6 @@ MMClassification 1.0 已经发布！目前仍在公测中，如果希望试用

 2022/5/1 发布了 v0.23.0 版本

-新版本亮点：
-
 - 支持了 **DenseNet**，**VAN** 和 **PoolFormer** 三个网络，并提供了预训练模型。
 - 支持在 IPU 上进行训练。
 - 更新了 API 文档的样式，更方便查阅，[欢迎查阅](https://mmclassification.readthedocs.io/en/master/api/models.html)。
--- a/docker/serve/Dockerfile
+++ b/docker/serve/Dockerfile
@ -3,8 +3,8 @@ ARG CUDA="10.2"
 ARG CUDNN="7"
 FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel

-ARG MMCV="1.6.2"
-ARG MMCLS="0.24.0"
+ARG MMCV="1.7.0"
+ARG MMCLS="0.24.1"

 ENV PYTHONUNBUFFERED TRUE

--- a/docs/en/_static/css/readthedocs.css
+++ b/docs/en/_static/css/readthedocs.css
@ -9,12 +9,19 @@ pre {
    white-space: pre;
 }

-article.pytorch-article .section :not(dt) > code {
+article.pytorch-article section code {
  padding: .2em .4em;
  background-color: #f3f4f7;
  border-radius: 5px;
 }

-table.colwidths-auto td {
+/* Disable the change in tables */
+article.pytorch-article section table code {
+  padding: unset;
+  background-color: unset;
+  border-radius: unset;
+}
+
+table.autosummary td {
  width: 50%
 }
--- a/docs/en/changelog.md
+++ b/docs/en/changelog.md
@ -1,5 +1,15 @@
 # Changelog

+## v0.24.1(31/10/2022)
+
+### New Features
+
+- Support mmcls with NPU backend. ([#1072](https://github.com/open-mmlab/mmclassification/pull/1072))
+
+### Bug Fixes
+
+- Fix performance issue in convnext DDP train. ([#1098](https://github.com/open-mmlab/mmclassification/pull/1098))
+
 ## v0.24.0(30/9/2022)

 ### Highlights
--- a/docs/en/conf.py
+++ b/docs/en/conf.py
@ -48,7 +48,6 @@ extensions = [
    'sphinx.ext.intersphinx',
    'sphinx.ext.napoleon',
    'sphinx.ext.viewcode',
-    'sphinx_markdown_tables',
    'myst_parser',
    'sphinx_copybutton',
 ]
--- a/docs/en/device/npu.md
+++ b/docs/en/device/npu.md
@ -0,0 +1,34 @@
+# NPU (HUAWEI Ascend)
+
+## Usage
+
+Please install MMCV with NPU device support according to {external+mmcv:doc}`the tutorial <get_started/build>`.
+
+Here we use 8 NPUs on your computer to train the model with the following command:
+
+```shell
+bash tools/dist_train.sh configs/cspnet/resnet50_8xb32_in1k.py 8 --device npu
+```
+
+Also, you can use only one NPU to trian the model with the following command:
+
+```shell
+python tools/train.py configs/cspnet/resnet50_8xb32_in1k.py --device npu
+```
+
+## Verified Models
+
+|                           Model                            | Top-1 (%) | Top-5 (%) |                            Config                             |                            Download                             |
+| :--------------------------------------------------------: | :-------: | :-------: | :-----------------------------------------------------------: | :-------------------------------------------------------------: |
+|            [CSPResNeXt50](../papers/cspnet.md)             |   77.10   |   93.55   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/cspnet/cspresnext50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/cspresnext50_8xb32_in1k.log.json) |
+|            [DenseNet121](../papers/densenet.md)            |   72.62   |   91.04   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/densenet/densenet121_4xb256_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/densenet121_4xb256_in1k.log.json) |
+| [EfficientNet-B4(AA + AdvProp)](../papers/efficientnet.md) |   75.55   |   92.86   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/efficientnet/efficientnet-b4_8xb32-01norm_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/efficientnet-b4_8xb32-01norm_in1k.log.json) |
+|              [HRNet-W18](../papers/hrnet.md)               |   77.01   |   93.46   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hrnet/hrnet-w18_4xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/hrnet-w18_4xb32_in1k.log.json) |
+|            [ResNetV1D-152](../papers/resnet.md)            |   77.11   |   94.54   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnetv1d152_8xb32_in1k.py) |                    [model](<>) \| [log](<>)                     |
+|              [ResNet-50](../papers/resnet.md)              |   76.40   |     -     | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet50_8xb32_in1k.py) |                    [model](<>) \| [log](<>)                     |
+|         [ResNetXt-32x4d-50](../papers/resnext.md)          |   77.55   |   93.75   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnext/resnext50-32x4d_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/resnext50-32x4d_8xb32_in1k.log.json) |
+|           [SE-ResNet-50](../papers/seresnet.md)            |   77.64   |   93.76   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/seresnet/seresnet50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/seresnet50_8xb32_in1k.log.json) |
+|                 [VGG-11](../papers/vgg.md)                 |   68.92   |   88.83   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/vgg/vgg11_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/vgg11_8xb32_in1k.log.json) |
+|      [ShuffleNetV2 1.0x](../papers/shufflenet_v2.md)       |   69.53   |   88.82   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/shufflenet-v2-1x_16xb64_in1k.json) |
+
+**All above models are provided by Huawei Ascend group.**
--- a/docs/en/faq.md
+++ b/docs/en/faq.md
@ -17,8 +17,8 @@ and make sure you fill in all required information in the template.

  | MMClassification version |      MMCV version      |
  | :----------------------: | :--------------------: |
-  |           dev            |  mmcv>=1.6.0, \<1.7.0  |
-  |     0.24.0 (master)      |  mmcv>=1.4.2, \<1.7.0  |
+  |           dev            |  mmcv>=1.7.0, \<1.9.0  |
+  |     0.24.1 (master)      |  mmcv>=1.4.2, \<1.9.0  |
  |          0.23.2          |  mmcv>=1.4.2, \<1.7.0  |
  |          0.22.1          |  mmcv>=1.4.2, \<1.6.0  |
  |          0.21.0          | mmcv>=1.4.2, \<=1.5.0  |
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@ -78,6 +78,13 @@ You can switch between Chinese and English documentation in the lower-left corne
   compatibility.md
   faq.md

+
+.. toctree::
+   :maxdepth: 1
+   :caption: Device Support
+
+   device/npu.md
+
 .. toctree::
   :caption: Language Switch

--- a/docs/zh_CN/_static/css/readthedocs.css
+++ b/docs/zh_CN/_static/css/readthedocs.css
@ -9,12 +9,19 @@ pre {
    white-space: pre;
 }

-article.pytorch-article .section :not(dt) > code {
+article.pytorch-article section code {
  padding: .2em .4em;
  background-color: #f3f4f7;
  border-radius: 5px;
 }

-table.colwidths-auto td {
+/* Disable the change in tables */
+article.pytorch-article section table code {
+  padding: unset;
+  background-color: unset;
+  border-radius: unset;
+}
+
+table.autosummary td {
  width: 50%
 }
--- a/docs/zh_CN/conf.py
+++ b/docs/zh_CN/conf.py
@ -48,7 +48,6 @@ extensions = [
    'sphinx.ext.intersphinx',
    'sphinx.ext.napoleon',
    'sphinx.ext.viewcode',
-    'sphinx_markdown_tables',
    'myst_parser',
    'sphinx_copybutton',
 ]
@ -214,7 +213,7 @@ intersphinx_mapping = {
    'python': ('https://docs.python.org/3', None),
    'numpy': ('https://numpy.org/doc/stable', None),
    'torch': ('https://pytorch.org/docs/stable/', None),
-    'mmcv': ('https://mmcv.readthedocs.io/en/master/', None),
+    'mmcv': ('https://mmcv.readthedocs.io/zh_CN/latest/', None),
 }


--- a/docs/zh_CN/device/npu.md
+++ b/docs/zh_CN/device/npu.md
@ -0,0 +1,34 @@
+# NPU (华为昇腾)
+
+## 使用方法
+
+首先，请参考 {external+mmcv:doc}`教程 <get_started/build>` 安装带有 NPU 支持的 MMCV。
+
+使用如下命令，可以利用 8 个 NPU 在机器上训练模型（以 ResNet 为例）：
+
+```shell
+bash tools/dist_train.sh configs/cspnet/resnet50_8xb32_in1k.py 8 --device npu
+```
+
+或者，使用如下命令，在一个 NPU 上训练模型（以 ResNet 为例）：
+
+```shell
+python tools/train.py configs/cspnet/resnet50_8xb32_in1k.py --device npu
+```
+
+## 经过验证的模型
+
+|                            模型                            | Top-1 (%) | Top-5 (%) |                            配置文件                            |                            相关下载                            |
+| :--------------------------------------------------------: | :-------: | :-------: | :------------------------------------------------------------: | :------------------------------------------------------------: |
+|            [CSPResNeXt50](../papers/cspnet.md)             |   77.10   |   93.55   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/cspnet/cspresnext50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/cspresnext50_8xb32_in1k.log.json) |
+|            [DenseNet121](../papers/densenet.md)            |   72.62   |   91.04   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/densenet/densenet121_4xb256_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/densenet121_4xb256_in1k.log.json) |
+| [EfficientNet-B4(AA + AdvProp)](../papers/efficientnet.md) |   75.55   |   92.86   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/efficientnet/efficientnet-b4_8xb32-01norm_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/efficientnet-b4_8xb32-01norm_in1k.log.json) |
+|              [HRNet-W18](../papers/hrnet.md)               |   77.01   |   93.46   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hrnet/hrnet-w18_4xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/hrnet-w18_4xb32_in1k.log.json) |
+|            [ResNetV1D-152](../papers/resnet.md)            |   77.11   |   94.54   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnetv1d152_8xb32_in1k.py) |                    [model](<>) \| [log](<>)                    |
+|              [ResNet-50](../papers/resnet.md)              |   76.40   |     -     | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet50_8xb32_in1k.py) |                    [model](<>) \| [log](<>)                    |
+|         [ResNetXt-32x4d-50](../papers/resnext.md)          |   77.55   |   93.75   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnext/resnext50-32x4d_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/resnext50-32x4d_8xb32_in1k.log.json) |
+|           [SE-ResNet-50](../papers/seresnet.md)            |   77.64   |   93.76   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/seresnet/seresnet50_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/seresnet50_8xb32_in1k.log.json) |
+|                 [VGG-11](../papers/vgg.md)                 |   68.92   |   88.83   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/vgg/vgg11_8xb32_in1k.py) | [model](<>) \| [log](https://download.openmmlab.com/mmclassification/v0/device/npu/vgg11_8xb32_in1k.log.json) |
+|      [ShuffleNetV2 1.0x](../papers/shufflenet_v2.md)       |   69.53   |   88.82   | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/shufflenet_v2/shufflenet-v2-1x_16xb64_in1k.py) |                    [model](<>) \| [log](<>)                    |
+
+**以上所有模型权重及训练日志均由华为昇腾团队提供**
--- a/docs/zh_CN/faq.md
+++ b/docs/zh_CN/faq.md
@ -15,8 +15,8 @@

  | MMClassification version |      MMCV version      |
  | :----------------------: | :--------------------: |
-  |           dev            |  mmcv>=1.6.0, \<1.7.0  |
-  |     0.24.0 (master)      |  mmcv>=1.4.2, \<1.7.0  |
+  |           dev            |  mmcv>=1.7.0, \<1.9.0  |
+  |     0.24.1 (master)      |  mmcv>=1.4.2, \<1.9.0  |
  |          0.23.2          |  mmcv>=1.4.2, \<1.7.0  |
  |          0.22.1          |  mmcv>=1.4.2, \<1.6.0  |
  |          0.21.0          | mmcv>=1.4.2, \<=1.5.0  |
--- a/docs/zh_CN/index.rst
+++ b/docs/zh_CN/index.rst
@ -78,6 +78,13 @@ You can switch between Chinese and English documentation in the lower-left corne
   faq.md


+.. toctree::
+   :maxdepth: 1
+   :caption: 设备支持
+
+   device/npu.md
+
+
 .. toctree::
   :caption: 语言切换

--- a/mmcls/init.py
+++ b/mmcls/init.py
@ -48,7 +48,7 @@ def digit_version(version_str: str, length: int = 4):


 mmcv_minimum_version = '1.4.2'
-mmcv_maximum_version = '1.7.0'
+mmcv_maximum_version = '1.9.0'
 mmcv_version = digit_version(mmcv.__version__)


--- a/mmcls/apis/train.py
+++ b/mmcls/apis/train.py
@ -131,7 +131,6 @@ def train_model(model,
        model = wrap_distributed_model(
            model,
            cfg.device,
-            device_ids=[torch.cuda.current_device()],
            broadcast_buffers=False,
            find_unused_parameters=find_unused_parameters)
    else:
@ -173,6 +172,10 @@ def train_model(model,

    # fp16 setting
    fp16_cfg = cfg.get('fp16', None)
+
+    if fp16_cfg is None and device == 'npu':
+        fp16_cfg = {'loss_scale': 'dynamic'}
+
    if fp16_cfg is not None:
        if device == 'ipu':
            from mmcv.device.ipu import IPUFp16OptimizerHook
--- a/mmcls/datasets/samplers/distributed_sampler.py
+++ b/mmcls/datasets/samplers/distributed_sampler.py
@ -4,6 +4,7 @@ from torch.utils.data import DistributedSampler as _DistributedSampler

 from mmcls.core.utils import sync_random_seed
 from mmcls.datasets import SAMPLERS
+from mmcls.utils import auto_select_device


@SAMPLERS.register_module()
@ -30,7 +31,7 @@ class DistributedSampler(_DistributedSampler):
        # in the same order based on the same seed. Then different ranks
        # could use different indices to select non-overlapped data from the
        # same data list.
-        self.seed = sync_random_seed(seed)
+        self.seed = sync_random_seed(seed, device=auto_select_device())

    def __iter__(self):
        # deterministically shuffle based on epoch
--- a/mmcls/models/backbones/convnext.py
+++ b/mmcls/models/backbones/convnext.py
@ -36,8 +36,8 @@ class LayerNorm2d(nn.LayerNorm):
        assert x.dim() == 4, 'LayerNorm2d only supports inputs with shape ' \
            f'(N, C, H, W), but got tensor with shape {x.shape}'
        return F.layer_norm(
-            x.permute(0, 2, 3, 1), self.normalized_shape, self.weight,
-            self.bias, self.eps).permute(0, 3, 1, 2)
+            x.permute(0, 2, 3, 1).contiguous(), self.normalized_shape,
+            self.weight, self.bias, self.eps).permute(0, 3, 1, 2).contiguous()


 class ConvNeXtBlock(BaseModule):
--- a/mmcls/utils/distribution.py
+++ b/mmcls/utils/distribution.py
@ -16,7 +16,10 @@ def wrap_non_distributed_model(model, device='cuda', dim=0, *args, **kwargs):
    Returns:
        model(nn.Module): the model to be parallelized.
    """
-    if device == 'cuda':
+    if device == 'npu':
+        from mmcv.device.npu import NPUDataParallel
+        model = NPUDataParallel(model.npu(), dim=dim, *args, **kwargs)
+    elif device == 'cuda':
        from mmcv.parallel import MMDataParallel
        model = MMDataParallel(model.cuda(), dim=dim, *args, **kwargs)
    elif device == 'cpu':
@ -49,9 +52,16 @@ def wrap_distributed_model(model, device='cuda', *args, **kwargs):
        .. [1] https://pytorch.org/docs/stable/generated/torch.nn.parallel.
               DistributedDataParallel.html
    """
-    if device == 'cuda':
+    if device == 'npu':
+        from mmcv.device.npu import NPUDistributedDataParallel
+        from torch.npu import current_device
+        model = NPUDistributedDataParallel(
+            model.npu(), *args, device_ids=[current_device()], **kwargs)
+    elif device == 'cuda':
        from mmcv.parallel import MMDistributedDataParallel
-        model = MMDistributedDataParallel(model.cuda(), *args, **kwargs)
+        from torch.cuda import current_device
+        model = MMDistributedDataParallel(
+            model.cuda(), *args, device_ids=[current_device()], **kwargs)
    else:
        raise RuntimeError(f'Unavailable device "{device}"')

--- a/mmcls/version.py
+++ b/mmcls/version.py
@ -1,6 +1,6 @@
 # Copyright (c) OpenMMLab. All rights reserved

-__version__ = '0.24.0'
+__version__ = '0.24.1'


 def parse_version_info(version_str):
--- a/requirements/mminstall.txt
+++ b/requirements/mminstall.txt
@ -1 +1 @@
-mmcv-full>=1.4.2,<1.7.0
+mmcv-full>=1.4.2,<1.9.0