Compare commits

...

65 Commits

Author SHA1 Message Date
mzr1996 367e00a820 Update isort hook 2023-06-01 17:03:14 +08:00
Yinlei Sun dd657320a4
[Fix] Fix calculation errors on ARM chip. (#1592) 2023-06-01 16:53:32 +08:00
Ezra-Yu f2adad2729
[DOC] Update recommended branch from MMClassification to MMPreTrain in README (#1443)
* upodate README

* fix typo
2023-04-03 15:57:45 +08:00
mzr1996 fb16bdc6a2 Fix docs 2023-02-15 11:14:51 +08:00
Magnus_Cheng 3d4f80d63c
[Docs] Fix the wrong VGG-16-BN config link in Model-Page. 2022-12-30 15:21:11 +08:00
Ma Zerun 48fdcdbf76
[Bot] Update assignee schedule. (#1262) 2022-12-13 17:23:23 +08:00
Ma Zerun 7bf42eef2d
[CI] Update CI of the master branch. (#1252) 2022-12-13 14:30:53 +08:00
mzr1996 2495400a98 Merge branch 'dev' 2022-12-06 18:25:47 +08:00
Ma Zerun c737e65164
Bump version to v0.25.0. (#1244) 2022-12-06 18:18:13 +08:00
Ma Zerun 3e99395c29
[Fix] Fix a bug caused MMClsWandbHook stuck. (#1242) 2022-12-06 16:50:41 +08:00
Ma Zerun 838735820c
[Docs] Add version banner and version warning in master docs. (#1216) 2022-12-05 14:13:25 +08:00
wangjiangben-hw 578c035d5c
[Enhance] Add `dist_train_arm.sh` for ARM device and update NPU results. (#1218)
* update npu results

* add dist_train_arm.sh & updata docs

* del content
2022-11-28 11:12:19 +08:00
Ma Zerun 96bb2dd219
[Fix] Fix the redundant `device_ids` in `tools/test.py`. (#1215) 2022-11-22 18:57:52 +08:00
TangYueran dc8691e889
[Feature] Support MLU backend. (#1159)
* Training on MLU is available
2022-11-15 17:02:16 +08:00
Ma Zerun 0eb3b61fc5
[Docs] Update NPU support doc. (#1198) 2022-11-15 16:07:08 +08:00
Hakjin Lee 05e4bc17b2
[Feature] Support Activation Checkpointing for ConvNeXt. (#1152)
* Support Activation Checkpointing for ConvNeXt

* Add test case

* Lint

* Add docstring
2022-11-14 17:08:31 +08:00
unseenme 4a3ad4f652
[Docs] Fixed typo in pytorch2torchscript.md (#1173)
* [Docs] Fixed typo in pytorch2torchscript.md

* [Docs] Fixed W293 blank line contains whitespace
2022-11-09 11:03:12 +08:00
ganghe74 9b1cd5c3b1
[Docs] Fix typo in miscellaneous.md. (#1137) 2022-11-04 15:10:38 +08:00
JayChen aacaa7316c
[Docs] further detail for the doc for `ClassBalancedDataset`. (#901)
* futher detail for the doc for datasets/dataset_wrappers/ClassBalancedDataset

* fix
2022-11-02 17:52:58 +08:00
mzr1996 8c63bb55a5 Merge branch 'dev' 2022-11-01 14:19:49 +08:00
Ma Zerun 29c54dd9ac
Bump version to v0.24.1 (#1150) 2022-11-01 14:17:38 +08:00
wangjiangben-hw dd664ffcd4
[Docs] Add NPU support page. (#1149)
* init readme

* [Docs] Finish the HUAWEI Ascend device support docs.

Co-authored-by: mzr1996 <mzr1996@163.com>
2022-11-01 14:10:18 +08:00
wangjiangben-hw 17ed870fd1
[Feature] Support mmcls with NPU backend. (#1072)
* init npu

* Avoid to import latest MMCV code to be compatible with old verisons.

Co-authored-by: mzr1996 <mzr1996@163.com>
2022-10-24 11:45:14 +08:00
Ma Zerun a9489f6bd0
[GitHub] Update issue template and remove general question template. (#1087)
* [CI] Fix CI error from timm and PyTorch version. (#1076)

* [GitHub] Update issue template and remove general question template.

* Add branch check dropdown options.
2022-10-20 16:32:34 +08:00
790475019 38040d5e05
[Fix] Fix performance issue in convnext DDP train. (#1098)
to fix performance issue in convnext DDP train
2022-10-17 10:10:19 +08:00
Ma Zerun bcadb74d5b
[CI] Fix CI error from timm and PyTorch version. (#1076) 2022-10-10 11:46:49 +08:00
mzr1996 91b85bb4a5 Merge remote-tracking branch 'origin/dev' 2022-09-30 18:06:17 +08:00
Ma Zerun 7b45eb10cd
Bump version to v0.24.0 (#1067) 2022-09-30 18:03:53 +08:00
Mengyang Liu c5bcd4801a
[Docs] Fix typo in config.md. (#827) 2022-09-30 15:02:24 +08:00
Philipp Allgeuer 7dca27dd57
[Fix] Fix warning with `torch.meshgrid`. (#860)
* Fix warning with torch.meshgrid

* Add torch_meshgrid_ij wrapper

* Use `digit_version` instead of packaging package.

Co-authored-by: mzr1996 <mzr1996@163.com>
2022-09-30 15:01:36 +08:00
Hakjin Lee 1b4e9cd22a
[Improve] replace loop of progressbar in api/test. (#878) 2022-09-30 14:41:07 +08:00
HinGwenWoong 4eaaf89618
[Docs] Add version for torchvision to avoide error. (#903)
* Add version for torchvision

* Add version for torchvision
2022-09-30 14:31:45 +08:00
JongYoon Lim 2102d09dfc
[Docs] Fixed typo for `--out-dir` option of analyze_results.py. (#898) 2022-09-30 14:30:45 +08:00
tpoisonooo 27b0bd5a72
[Fix] Add matplotlib minimum version requriments. (#909) 2022-09-30 14:22:42 +08:00
takuoko 8c7b7b15a3
[Enhance] RepVGG for YOLOX-PAI. (#1025)
* repvgg add ppf for yoloxpai

* fix by review

* update stem_channels

* fix doc

Co-authored-by: Ezra-Yu <18586273+Ezra-Yu@users.noreply.github.com>
2022-09-30 14:20:53 +08:00
Fei Wang 0143e5fdb0
[Fix] val loader should not drop last by default. (#857) 2022-09-28 08:22:23 +08:00
Ezra-Yu 4d73607fb8
[Fix] Fix config.device bug in toturial. (#1059) 2022-09-28 08:17:26 +08:00
takuoko 1047daa28e
[Feature] Support HorNet Backbone. (#1013)
* add hornet

* add hornet

* add hornet

* add hornet

* add hornet

* add hornet

* add hornet

* fix test for torch before 1.7.0

* del timm

* fix readme

* fix readme

* Update mmcls/models/backbones/hornet.py

Co-authored-by: Ezra-Yu <18586273+Ezra-Yu@users.noreply.github.com>

* fix docs

* fix docs

* s -> scale

* fix dims and dpr impl

* fix layer scale

* refactor gnconv

* add dw_cfg

* add convert tools

* update code

* update docs

* update readme

* update URLs

Co-authored-by: Ezra-Yu <18586273+Ezra-Yu@users.noreply.github.com>
2022-09-27 10:37:49 +08:00
takuoko 56589ee280
[Enhancement] Update VAN. (#1017)
* update van

* fix init

* b4 result

* update van

* keep old config

* keep old config

* fix metafile

* update VAN configs

* update example

Co-authored-by: Ezra-Yu <18586273+Ezra-Yu@users.noreply.github.com>
2022-09-27 09:44:40 +08:00
Hubert 6ebb3f77ad
[Fix] Fix attenstion clamp max params (#1034) 2022-09-26 14:12:51 +08:00
Songyang Zhang c94e9b3669
[Feature] Update the issue template with more links and emoji. (#1032)
* [Feature] update the issue template with more links and emoji

* fix lint error

* Use yaml format issue templates.

* Update template

Co-authored-by: mzr1996 <mzr1996@163.com>
2022-09-26 14:05:26 +08:00
WRH 75ae8453ac
[Docs] Fix a typo in ImageClassifier (#1050) 2022-09-22 09:24:23 +08:00
Lei Lei a1b644bc75
[Doc] Fix typo in tutorial 2 (#1043) 2022-09-19 13:46:45 +08:00
Lei Lei 8d1bc557ab
[Docs] Fix typo for wrong reference. (#1036) 2022-09-16 14:24:47 +08:00
Kai Hu 0b4a67dd31
[Refactor] Re-write get_sinusoid_encoding from third-party implementation. (#965) 2022-09-13 15:24:29 +08:00
daquexian 6d8c91892c
[Improve] Upgrade onnxsim to v0.4.0. (#915) 2022-09-13 15:13:20 +08:00
mzr1996 982cab4138 Update README 2022-09-07 17:13:33 +08:00
Andrey Moskalenko 517bd3d34b
[Fix] Fix device mismatch in Swin-v2. (#976) 2022-09-01 18:03:49 +08:00
Jiahao Wang ec71d071d8
[Improve] Fixed typo in `RepVGG`. (#985)
* [Improve] Use `forward_dummy` to calculate FLOPS. (#953)

* fixed

Co-authored-by: Ming-Hsuan-Tu <alec.tu@acer.com>
2022-08-22 10:28:33 +08:00
mzr1996 5ad3bed2cd Merge remote-tracking branch 'origin/master' into dev 2022-08-22 10:12:24 +08:00
Ezra-Yu 6474ea2fc0
[Feature] Support EfficientFormer. (#954)
* add efficient backbone

* Update Readme and metafile

* Add unit tests

* fix confict

* fix lint

* update efficientformer head unit tests

* update README

* fix unit test

* fix Readme

* fix example

* fix typo

* recover api modification

* Update EfficiemtFormer Backbone

* fix unit tests

* add efficientformer to readme and model zoo
2022-08-16 23:38:08 +08:00
zzc98 7b16bcdd9b
[Feature] Support Stanford Cars dataset. (#893)
* feat: add stanford car dataset

* feat: add stanford car dataset

* feat: add stanford car dataset

* feat: add stanford car dataset

* feat: add stanford car dataset

* feat: add stanford car dataset

* Update links and using cars insteam of car

* place ependency scipy from runtime to optional

* Fix docstring

Co-authored-by: Ezra-Yu <1105212286@qq.com>
Co-authored-by: mzr1996 <mzr1996@163.com>
2022-08-16 11:14:17 +08:00
Ezra-Yu e54cfd6951
[Imporve] Using `train_step` instead of `forward` in PreciseBNHook (#964)
* fix precise BN hook when using MLU

* fix unit tests
2022-08-11 15:02:25 +08:00
Timothy Lim b366897889
[Docs] Refine the docstring of RegNet (#935)
* Update regnet.py

In the example comment to print out the different layers of outputs, we need to indicate the `out_indices` to (0,1,2,3) to see all backbone layers output as the default argument is (3,)

* Update regnet.py

following changes proposal of maintainer

* fix linting

* fix blank space for docs

* fix blank space for docs

* fix blank space for docs
2022-08-10 18:17:36 +08:00
Ming-Hsuan-Tu 90254a8455
[Improve] Use `forward_dummy` to calculate FLOPS. (#953) 2022-08-08 18:34:09 +08:00
JiayuXu 1a3d51acc2
[Feature] Support CSRA head. (#881)
* Support CSRA head.

* Add CSRA config.

* Improve training scheduler and Update cfg, ckpt, log

* Update metafile

* Rename config files and checkpoints

Co-authored-by: Ezra-Yu <1105212286@qq.com>
Co-authored-by: mzr1996 <mzr1996@163.com>
2022-08-04 18:15:51 +08:00
Ma Zerun b5bb86a357
[Fix] Fix the output position of Swin-Transformer. (#947)
* [Fix] Fix the output position of Swin-Transformer.

* Rename `downsample` argument to `do_downsample`.
2022-08-03 19:32:29 +08:00
Hubert 6ec38fe742
[Feature] Support Swin Transform V2. (#799)
* init rough try for modify

* Init swin transform v2

* lint

* reformat

* init config

* refactor

* update config

* fix test

* add doc

* refact

* add model meta

* rename config

* add doc

* fix meta model name

* restruct

* rename embed_bims to out_channels

* fix ut and update model
2022-08-03 17:33:08 +08:00
Ma Zerun 556fa567a8
[Feature] Support MViT and add checkpoints. (#924)
* [Feature] Support MViT.

* Add MViT configs and docs

* Add unit test

* Fix unit tests.
2022-08-02 15:20:16 +08:00
mzr1996 71ef7bae85 Merge remote-tracking branch 'origin/dev' 2022-07-28 14:15:52 +08:00
Ma Zerun 9300cc4e3f
Bump version to 0.23.2. (#937) 2022-07-28 14:15:23 +08:00
HinGwenWoong 00f0e0d0be
[Fix] Fix Albu crash bug. (#918)
* Fix albu BUG: using albu will cause the label from array(x) to array([x]) and crash the trainning

* Fix common

* Using copy incase potential bug in multi-label tasks

* Improve coding

* Improve code logic

* Add unit test

* Fix typo

* Fix yapf
2022-07-28 14:10:34 +08:00
Ma Zerun c03efeeea4
[Feature] Support MPS device. (#894)
* [Feature] Support MPS device.

* Add `auto_select_device`

* Add unit tests
2022-07-28 12:28:51 +08:00
mzr1996 11df205e39 [Fix] Remove duplicated wide-resnet metafile. 2022-07-05 11:35:44 +08:00
Ma Zerun 812f3d4536
[CI] Add test mim CI. (#879) 2022-06-22 17:28:02 +08:00
194 changed files with 9042 additions and 2219 deletions

View File

@ -187,31 +187,12 @@ workflows:
env: LRU_CACHE_CAPACITY=1
requires:
- lint
- build:
name: build_py36_torch1.6
torch: 1.6.0
torchvision: 0.7.0
requires:
- lint
- build:
name: build_py36_torch1.7
torch: 1.7.0
torchvision: 0.8.1
requires:
- lint
- build:
name: build_py36_torch1.8
torch: 1.8.0
torchvision: 0.9.0
requires:
- lint
- build:
name: build_py39_torch1.8
torch: 1.8.0
torchvision: 0.9.0
python: "3.9.0"
requires:
- lint
- build:
name: build_py39_torch1.9
torch: 1.9.0
@ -219,21 +200,10 @@ workflows:
python: "3.9.0"
requires:
- lint
- build:
name: build_py39_torch1.10
torch: 1.10.0
torchvision: 0.11.1
python: "3.9.0"
requires:
- lint
- build_with_cuda:
name: build_py36_torch1.6_cu101
requires:
- build_with_timm
- build_py36_torch1.5
- build_py36_torch1.6
- build_py36_torch1.7
- build_py36_torch1.8
- build_py39_torch1.8
- build_py39_torch1.9
- build_py39_torch1.10

View File

@ -1,33 +0,0 @@
---
name: 寻求帮助
about: 遇到问题并寻求帮助
title: ''
labels: help wanted
assignees: ''
---
推荐使用英语模板 General question以便你的问题帮助更多人。
### 首先确认以下内容
- 我已经查询了相关的 issue但没有找到需要的帮助。
- 我已经阅读了相关文档,但仍不知道如何解决。
### 描述你遇到的问题
\[填写这里\]
### 相关信息
1. `pip list | grep "mmcv\|mmcls\|^torch"` 命令的输出
\[填写这里\]
2. 如果你修改了,或者使用了新的配置文件,请在这里写明
```python
[填写这里]
```
3. 如果你是在训练过程中遇到的问题,请填写完整的训练日志和报错信息
\[填写这里\]
4. 如果你对 `mmcls` 文件夹下的代码做了其他相关的修改,请在这里写明
\[填写这里\]

View File

@ -1,34 +0,0 @@
---
name: 新功能
about: 为项目提一个建议
title: '[Feature]'
labels: enhancement
assignees: ''
---
推荐使用英语模板 Feature request以便你的问题帮助更多人。
### 描述这个功能
\[填写这里\]
### 动机
请简要说明以下为什么需要添加这个新功能
例 1. 现在进行 xxx 的时候不方便
例 2. 最近的论文中提出了有一个很有帮助的 xx
\[填写这里\]
### 相关资源
是否有相关的官方实现或者第三方实现?这些会很有参考意义。
\[填写这里\]
### 其他相关信息
其他和这个功能相关的信息或者截图,请放在这里。
另外如果你愿意参与实现这个功能并提交 PR请在这里说明我们将非常欢迎。
\[填写这里\]

View File

@ -1,44 +0,0 @@
---
name: 报告 Bug
about: 报告问题以帮助我们提升
title: '[Bug]'
labels: bug
assignees: ''
---
推荐使用英语模板 Bug report以便你的问题帮助更多人。
### 描述 bug
简单地描述一下遇到了什么 bug
\[填写这里\]
### 复现流程
在命令行中执行的详细操作
```shell
[填写这里]
```
### 相关信息
1. `pip list | grep "mmcv\|mmcls\|^torch"` 命令的输出
\[填写这里\]
2. 如果你修改了,或者使用了新的配置文件,请在这里写明
```python
[填写这里]
```
3. 如果你是在训练过程中遇到的问题,请填写完整的训练日志和报错信息
\[填写这里\]
4. 如果你对 `mmcls` 文件夹下的代码做了其他相关的修改,请在这里写明
\[填写这里\]
### 附加内容
任何其他有关该 bug 的信息、截图等
\[填写这里\]

View File

@ -0,0 +1,68 @@
name: 🐞 Bug report
description: Create a report to help us improve
labels: ["bug"]
title: "[Bug] "
body:
- type: markdown
attributes:
value: |
If you have already identified the reason, we strongly appreciate you creating a new PR according to [the tutorial](https://mmclassification.readthedocs.io/en/master/community/CONTRIBUTING.html)!
If you need our help, please fill in the following form to help us to identify the bug.
- type: dropdown
id: version
attributes:
label: Branch
description: Which branch/version are you using?
options:
- master branch (0.24 or other 0.x version)
- 1.x branch (1.0.0rc2 or other 1.x version)
validations:
required: true
- type: textarea
id: describe
validations:
required: true
attributes:
label: Describe the bug
description: |
Please provide a clear and concise description of what the bug is.
Preferably a simple and minimal code snippet that we can reproduce the error by running the code.
placeholder: |
A clear and concise description of what the bug is.
```python
# Sample code to reproduce the problem
```
```shell
The command or script you run.
```
```
The error message or logs you got, with the full traceback.
```
- type: textarea
id: environment
validations:
required: true
attributes:
label: Environment
description: |
Please run `python -c "import mmcls.utils;import pprint;pprint.pp(dict(mmcls.utils.collect_env()))"` to collect necessary environment information and paste it here.
placeholder: |
```python
# The output the above command
```
- type: textarea
id: other
attributes:
label: Other information
description: |
Tell us anything else you think we should know.
1. Did you make any modifications on the code or config?
2. What do you think might be the reason?

View File

@ -0,0 +1,40 @@
name: 🚀 Feature request
description: Suggest an idea for this project
labels: ["enhancement"]
title: "[Feature] "
body:
- type: markdown
attributes:
value: |
If you have already implemented the feature, we strongly appreciate you creating a new PR according to [the tutorial](https://mmclassification.readthedocs.io/en/master/community/CONTRIBUTING.html)!
- type: dropdown
id: version
attributes:
label: Branch
description: Which branch/version are you using?
options:
- master branch (0.24 or other 0.x version)
- 1.x branch (1.0.0rc2 or other 1.x version)
validations:
required: true
- type: textarea
id: describe
validations:
required: true
attributes:
label: Describe the feature
description: |
What kind of feature do you want MMClassification to add. If there is an official code release or third-party implementation, please also provide the information here, which would be very helpful.
placeholder: |
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when \[....\].
Ex2. There is a recent paper \[....\], which is very helpful for \[....\].
- type: checkboxes
id: pr
attributes:
label: Will you implement it?
options:
- label: I would like to implement this feature and create a PR!

View File

@ -0,0 +1,69 @@
name: 🐞 报告 Bug
description: 报告你在使用中遇到的不合预期的情况
labels: ["bug"]
title: "[Bug] "
body:
- type: markdown
attributes:
value: |
我们推荐使用英语模板 Bug report以便你的问题帮助更多人。
如果你已经有了解决方案,我们非常欢迎你直接创建一个新的 PR 来解决这个问题。创建 PR 的流程可以参考[文档](https://mmclassification.readthedocs.io/zh_CN/master/community/CONTRIBUTING.html)。
如果你需要我们的帮助,请填写以下内容帮助我们定位 Bug。
- type: dropdown
id: version
attributes:
label: 分支
description: 你正在使用的分支/版本是哪个?
options:
- master 分支 (0.24 或其他 0.x 版本)
- 1.x 分支 (1.0.0rc2 或其他 1.x 版本)
validations:
required: true
- type: textarea
id: describe
validations:
required: true
attributes:
label: 描述该错误
description: |
请简要说明你遇到的错误。如果可以的话,请提供一个简短的代码片段帮助我们复现这一错误。
placeholder: |
问题的简要说明
```python
# 复现错误的代码片段
```
```shell
# 发生错误时你的运行命令
```
```
错误信息和日志,请展示全部的错误日志和 traceback
```
- type: textarea
id: environment
validations:
required: true
attributes:
label: 环境信息
description: |
请运行指令 `python -c "import mmcls.utils;import pprint;pprint.pp(dict(mmcls.utils.collect_env()))"` 来收集必要的环境信息,并贴在下方。
placeholder: |
```python
# 上述命令的输出
```
- type: textarea
id: other
attributes:
label: 其他信息
description: |
告诉我们其他有价值的信息。
1. 你是否对代码或配置文件做了任何改动?
2. 你认为可能的原因是什么?

View File

@ -0,0 +1,42 @@
name: 🚀 功能建议
description: 建议一项新的功能
labels: ["enhancement"]
title: "[Feature] "
body:
- type: markdown
attributes:
value: |
推荐使用英语模板 Feature request以便你的问题帮助更多人。
如果你已经实现了该功能,我们非常欢迎你直接创建一个新的 PR 来解决这个问题。创建 PR 的流程可以参考[文档](https://mmclassification.readthedocs.io/zh_CN/master/community/CONTRIBUTING.html)。
- type: dropdown
id: version
attributes:
label: 分支
description: 你正在使用的分支/版本是哪个?
options:
- master 分支 (0.24 或其他 0.x 版本)
- 1.x 分支 (1.0.0rc2 或其他 1.x 版本)
validations:
required: true
- type: textarea
id: describe
validations:
required: true
attributes:
label: 描述该功能
description: |
你希望 MMClassification 添加什么功能?如果存在相关的论文、官方实现或者第三方实现,请同时贴出链接,这将非常有帮助。
placeholder: |
简要说明该功能,及为什么需要该功能
例 1. 现在进行 xxx 的时候不方便
例 2. 最近的论文中提出了有一个很有帮助的 xx
- type: checkboxes
id: pr
attributes:
label: 是否希望自己实现该功能?
options:
- label: 我希望自己来实现这一功能,并向 MMClassification 贡献代码!

View File

@ -1,42 +0,0 @@
---
name: Bug report
about: Create a report to help us improve
title: '[Bug]'
labels: bug
assignees: ''
---
### Describe the bug
A clear and concise description of what the bug is.
\[here\]
### To Reproduce
The command you executed.
```shell
[here]
```
### Post related information
1. The output of `pip list | grep "mmcv\|mmcls\|^torch"`
\[here\]
2. Your config file if you modified it or created a new one.
```python
[here]
```
3. Your train log file if you meet the problem during training.
\[here\]
4. Other code you modified in the `mmcls` folder.
\[here\]
### Additional context
Add any other context about the problem here.
\[here\]

View File

@ -1,6 +1,12 @@
blank_issues_enabled: false
contact_links:
- name: MMClassification Documentation
- name: 📚 MMClassification Documentation (官方文档)
url: https://mmclassification.readthedocs.io/en/latest/
about: Check if your question is answered in docs
- name: 💬 General questions (寻求帮助)
url: https://github.com/open-mmlab/mmclassification/discussions
about: Ask general usage questions and discuss with other MMClassification community members
- name: 🌐 Explore OpenMMLab (官网)
url: https://openmmlab.com/
about: Get know more about OpenMMLab

View File

@ -1,32 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: '[Feature]'
labels: enhancement
assignees: ''
---
### Describe the feature
\[here\]
### Motivation
A clear and concise description of the motivation of the feature.
Ex1. It is inconvenient when \[....\].
Ex2. There is a recent paper \[....\], which is very helpful for \[....\].
\[here\]
### Related resources
If there is an official code release or third-party implementation, please also provide the information here, which would be very helpful.
\[here\]
### Additional context
Add any other context or screenshots about the feature request here.
If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.
\[here\]

View File

@ -1,31 +0,0 @@
---
name: General questions
about: 'Ask general questions to get help '
title: ''
labels: help wanted
assignees: ''
---
### Checklist
- I have searched related issues but cannot get the expected help.
- I have read related documents and don't know what to do.
### Describe the question you meet
\[here\]
### Post related information
1. The output of `pip list | grep "mmcv\|mmcls\|^torch"`
\[here\]
2. Your config file if you modified it or created a new one.
```python
[here]
```
3. Your train log file if you meet the problem during training.
\[here\]
4. Other code you modified in the `mmcls` folder.
\[here\]

View File

@ -29,91 +29,18 @@ concurrency:
cancel-in-progress: true
jobs:
build_without_timm:
build_with_timm:
runs-on: ubuntu-latest
env:
UBUNTU_VERSION: ubuntu1804
strategy:
matrix:
python-version: [3.6]
torch: [1.5.0, 1.8.0, 1.9.0]
python-version: [3.8]
torch: [1.8.0]
include:
- torch: 1.5.0
torchvision: 0.6.0
torch_major: 1.5.0
- torch: 1.8.0
torchvision: 0.9.0
torch_major: 1.8.0
- torch: 1.9.0
torchvision: 0.10.0
torch_major: 1.9.0
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install PyTorch
run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
- name: Install MMCV
run: |
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cpu/torch${{matrix.torch_major}}/index.html
python -c 'import mmcv; print(mmcv.__version__)'
- name: Install mmcls dependencies
run: |
pip install -r requirements.txt
- name: Build and install
run: |
rm -rf .eggs
pip install -e . -U
- name: Run unittests
run: |
pytest tests/ --ignore tests/test_models/test_backbones/test_timm_backbone.py
build:
runs-on: ubuntu-latest
env:
UBUNTU_VERSION: ubuntu1804
strategy:
matrix:
python-version: [3.7]
torch: [1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0]
include:
- torch: 1.5.0
torchvision: 0.6.0
torch_major: 1.5.0
- torch: 1.6.0
torchvision: 0.7.0
torch_major: 1.6.0
- torch: 1.7.0
torchvision: 0.8.1
torch_major: 1.7.0
- torch: 1.8.0
torchvision: 0.9.0
torch_major: 1.8.0
- torch: 1.8.0
torchvision: 0.9.0
torch_major: 1.8.0
python-version: 3.8
- torch: 1.8.0
torchvision: 0.9.0
torch_major: 1.8.0
python-version: 3.9
- torch: 1.9.0
torchvision: 0.10.0
torch_major: 1.9.0
- torch: 1.10.0
torchvision: 0.11.1
torch_major: 1.10.0
- torch: 1.10.0
torchvision: 0.11.1
torch_major: 1.10.0
python-version: 3.8
- torch: 1.10.0
torchvision: 0.11.1
torch_major: 1.10.0
python-version: 3.9
steps:
- uses: actions/checkout@v2
@ -137,7 +64,7 @@ jobs:
run: |
rm -rf .eggs
pip install -e . -U
- name: Run unittests and generate coverage report
- name: Run unittests
run: |
coverage run --branch --source mmcls -m pytest tests/
coverage xml
@ -151,6 +78,64 @@ jobs:
name: codecov-umbrella
fail_ci_if_error: false
build:
runs-on: ubuntu-latest
env:
UBUNTU_VERSION: ubuntu1804
strategy:
matrix:
torch: [1.5.0, 1.8.0, 1.10.0, 1.13.0]
include:
- torch: 1.5.0
torchvision: 0.6.0
torch_major: 1.5.0
python-version: '3.7'
- torch: 1.8.0
torchvision: 0.9.0
torch_major: 1.8.0
python-version: '3.8'
- torch: 1.10.0
torchvision: 0.11.1
torch_major: 1.10.0
python-version: '3.9'
- torch: 1.13.0
torchvision: 0.14.0
torch_major: 1.13.0
python-version: '3.10'
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install PyTorch
run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
- name: Install MMCV
run: |
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cpu/torch${{matrix.torch_major}}/index.html
python -c 'import mmcv; print(mmcv.__version__)'
- name: Install mmcls dependencies
run: |
pip install -r requirements.txt
- name: Build and install
run: |
rm -rf .eggs
pip install -e . -U
- name: Run unittests and generate coverage report
run: |
coverage run --branch --source mmcls -m pytest tests/ -k "not timm"
coverage xml
coverage report -m --omit="mmcls/utils/*","mmcls/apis/*"
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v2
with:
file: ./coverage.xml
flags: unittests
env_vars: OS,PYTHON
name: codecov-umbrella
fail_ci_if_error: false
build-windows:
runs-on: windows-2022
strategy:

44
.github/workflows/test-mim.yml vendored 100644
View File

@ -0,0 +1,44 @@
name: test-mim
on:
push:
paths:
- 'model-index.yml'
- 'configs/**'
pull_request:
paths:
- 'model-index.yml'
- 'configs/**'
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
build_cpu:
runs-on: ubuntu-18.04
strategy:
matrix:
python-version: [3.7]
torch: [1.8.0]
include:
- torch: 1.8.0
torch_version: torch1.8
torchvision: 0.9.0
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Upgrade pip
run: pip install pip --upgrade
- name: Install PyTorch
run: pip install torch==${{matrix.torch}}+cpu torchvision==${{matrix.torchvision}}+cpu -f https://download.pytorch.org/whl/torch_stable.html
- name: Install openmim
run: pip install openmim
- name: Build and install
run: rm -rf .eggs && mim install -e .
- name: test commands of mim
run: mim search mmcls

18
.owners.yml 100644
View File

@ -0,0 +1,18 @@
# assign issues to owners automatically
assign:
issues: enabled # or disabled
pull_requests: enabled # or disabled
# assign strategy, both issues and pull requests follow the same strategy
strategy:
# random
# round-robin
daily-shift-based
schedule:
'*/1 * * * *'
# assignees
assignees:
- mzr1996
- tonysy
- Ezra-Yu
- techmonsterwang
- yingfhu

View File

@ -5,7 +5,7 @@ repos:
hooks:
- id: flake8
- repo: https://github.com/PyCQA/isort
rev: 5.10.1
rev: 5.11.5
hooks:
- id: isort
- repo: https://github.com/pre-commit/mirrors-yapf
@ -29,9 +29,9 @@ repos:
rev: 0.7.9
hooks:
- id: mdformat
args: ["--number", "--table-width", "200"]
args: ["--number", "--table-width", "200", '--disable-escape', 'backslash', '--disable-escape', 'link-enclosure']
additional_dependencies:
- mdformat-openmmlab
- "mdformat-openmmlab>=0.0.4"
- mdformat_frontmatter
- linkify-it-py
- repo: https://github.com/codespell-project/codespell

View File

@ -33,6 +33,8 @@
[🆕 Update News](https://mmclassification.readthedocs.io/en/latest/changelog.html) |
[🤔 Reporting Issues](https://github.com/open-mmlab/mmclassification/issues/new/choose)
:point_right: **MMPreTrain 1.0 branch is in trial, welcome every to [try it](https://github.com/open-mmlab/mmclassification/tree/pretrain) and [discuss with us](https://github.com/open-mmlab/mmclassification/discussions)!** :point_left:
</div>
## Introduction
@ -58,20 +60,26 @@ The master branch works with **PyTorch 1.5+**.
## What's new
v0.23.0 was released in 1/5/2022.
The MMClassification 1.0 has released! It's still unstable and in release candidate. If you want to try it, go
to [the 1.x branch](https://github.com/open-mmlab/mmclassification/tree/1.x) and discuss it with us in
[the discussion](https://github.com/open-mmlab/mmclassification/discussions).
v0.25.0 was released in 06/12/2022.
Highlights of the new version:
- Support **DenseNet**, **VAN** and **PoolFormer**, and provide pre-trained models.
- Support training on IPU.
- New style API docs, welcome [view it](https://mmclassification.readthedocs.io/en/master/api/models.html).
v0.22.0 was released in 30/3/2022.
- Support MLU backend.
- Add `dist_train_arm.sh` for ARM device.
v0.24.1 was released in 31/10/2022.
Highlights of the new version:
- Support a series of **CSP Network**, such as CSP-ResNet, CSP-ResNeXt and CSP-DarkNet.
- A new `CustomDataset` class to help you **build dataset of yourself**!
- Support new backbones - **ConvMixer**, **RepMLP** and new dataset - **CUB dataset**.
- Support HUAWEI Ascend device.
v0.24.0 was released in 30/9/2022.
Highlights of the new version:
- Support **HorNet**, **EfficientFormerm**, **SwinTransformer V2** and **MViT** backbones.
- Support Standford Cars dataset.
Please refer to [changelog.md](docs/en/changelog.md) for more details and other release history.
@ -80,7 +88,7 @@ Please refer to [changelog.md](docs/en/changelog.md) for more details and other
Below are quick steps for installation:
```shell
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision==0.11.0 -c pytorch -y
conda activate open-mmlab
pip3 install openmim
mim install mmcv-full
@ -142,6 +150,9 @@ Results and models are available in the [model zoo](https://mmclassification.rea
- [x] [ConvMixer](https://github.com/open-mmlab/mmclassification/tree/master/configs/convmixer)
- [x] [CSPNet](https://github.com/open-mmlab/mmclassification/tree/master/configs/cspnet)
- [x] [PoolFormer](https://github.com/open-mmlab/mmclassification/tree/master/configs/poolformer)
- [x] [MViT](https://github.com/open-mmlab/mmclassification/tree/master/configs/mvit)
- [x] [EfficientFormer](https://github.com/open-mmlab/mmclassification/tree/master/configs/efficientformer)
- [x] [HorNet](https://github.com/open-mmlab/mmclassification/tree/master/configs/hornet)
</details>

View File

@ -33,6 +33,10 @@
[🆕 更新日志](https://mmclassification.readthedocs.io/en/latest/changelog.html) |
[🤔 报告问题](https://github.com/open-mmlab/mmclassification/issues/new/choose)
:point_right: **MMPreTrain 1.0 版本即将正式发布,欢迎大家 [试用](https://github.com/open-mmlab/mmclassification/tree/pretrain) 并 [参与讨论](https://github.com/open-mmlab/mmclassification/discussions)** :point_left:
</div>
</div>
## Introduction
@ -57,22 +61,28 @@ MMClassification 是一款基于 PyTorch 的开源图像分类工具箱,是 [O
## 更新日志
2022/5/1 发布了 v0.23.0 版本
MMClassification 1.0 已经发布!目前仍在公测中,如果希望试用,请切换到 [1.x 分支](https://github.com/open-mmlab/mmclassification/tree/1.x),并在[讨论版](https://github.com/open-mmlab/mmclassification/discussions) 参加开发讨论!
新版本亮点:
2022/12/06 发布了 v0.25.0 版本
- 支持 MLU 设备
- 添加了用于 ARM 设备训练的 `dist_train_arm.sh`
2022/10/31 发布了 v0.24.1 版本
- 支持了华为昇腾 NPU 设备。
2022/9/30 发布了 v0.24.0 版本
- 支持了 **HorNet****EfficientFormerm****SwinTransformer V2****MViT** 等主干网络。
- 支持了 Support Standford Cars 数据集。
2022/5/1 发布了 v0.23.0 版本
- 支持了 **DenseNet****VAN** 和 **PoolFormer** 三个网络,并提供了预训练模型。
- 支持在 IPU 上进行训练。
- 更新了 API 文档的样式,更方便查阅,[欢迎查阅](https://mmclassification.readthedocs.io/en/master/api/models.html)。
2022/3/30 发布了 v0.22.0 版本
新版本亮点:
- 支持了一系列 **CSP Net**,包括 CSP-ResNetCSP-ResNeXt 和 CSP-DarkNet。
- 我们提供了一个新的 `CustomDataset` 类,这个类将帮助你轻松使用**自己的数据集**
- 支持了新的主干网络 **ConvMixer**、**RepMLP** 和一个新的数据集 **CUB dataset**
发布历史和更新细节请参考 [更新日志](docs/en/changelog.md)
## 安装
@ -80,7 +90,7 @@ MMClassification 是一款基于 PyTorch 的开源图像分类工具箱,是 [O
以下是安装的简要步骤:
```shell
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision==0.11.0 -c pytorch -y
conda activate open-mmlab
pip3 install openmim
mim install mmcv-full
@ -142,6 +152,9 @@ pip3 install -e .
- [x] [ConvMixer](https://github.com/open-mmlab/mmclassification/tree/master/configs/convmixer)
- [x] [CSPNet](https://github.com/open-mmlab/mmclassification/tree/master/configs/cspnet)
- [x] [PoolFormer](https://github.com/open-mmlab/mmclassification/tree/master/configs/poolformer)
- [x] [MViT](https://github.com/open-mmlab/mmclassification/tree/master/configs/mvit)
- [x] [EfficientFormer](https://github.com/open-mmlab/mmclassification/tree/master/configs/efficientformer)
- [x] [HorNet](https://github.com/open-mmlab/mmclassification/tree/master/configs/hornet)
</details>

View File

@ -0,0 +1,71 @@
_base_ = ['./pipelines/rand_aug.py']
# dataset settings
dataset_type = 'ImageNet'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=256,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(292, -1), # ( 256 / 224 * 256 )
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=256),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=64,
workers_per_gpu=8,
train=dict(
type=dataset_type,
data_prefix='data/imagenet/train',
pipeline=train_pipeline),
val=dict(
type=dataset_type,
data_prefix='data/imagenet/val',
ann_file='data/imagenet/meta/val.txt',
pipeline=test_pipeline),
test=dict(
# replace `data/val` with `data/test` for standard test
type=dataset_type,
data_prefix='data/imagenet/val',
ann_file='data/imagenet/meta/val.txt',
pipeline=test_pipeline))
evaluation = dict(interval=10, metric='accuracy')

View File

@ -0,0 +1,46 @@
# dataset settings
dataset_type = 'StanfordCars'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', size=512),
dict(type='RandomCrop', size=448),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', size=512),
dict(type='CenterCrop', crop_size=448),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data_root = 'data/stanfordcars'
data = dict(
samples_per_gpu=8,
workers_per_gpu=2,
train=dict(
type=dataset_type,
data_prefix=data_root,
test_mode=False,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
data_prefix=data_root,
test_mode=True,
pipeline=test_pipeline),
test=dict(
type=dataset_type,
data_prefix=data_root,
test_mode=True,
pipeline=test_pipeline))
evaluation = dict(
interval=1, metric='accuracy',
save_best='auto') # save the checkpoint with highest accuracy

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='base-gf', drop_path_rate=0.5),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1024,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='base', drop_path_rate=0.5),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1024,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='large-gf', drop_path_rate=0.2),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1536,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,17 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='large-gf384', drop_path_rate=0.4),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1536,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
])

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='large', drop_path_rate=0.2),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1536,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='small-gf', drop_path_rate=0.4),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='small', drop_path_rate=0.4),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='tiny-gf', drop_path_rate=0.2),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='HorNet', arch='tiny', drop_path_rate=0.2),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-6)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,19 @@
model = dict(
type='ImageClassifier',
backbone=dict(type='MViT', arch='base', drop_path_rate=0.3),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
in_channels=768,
num_classes=1000,
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,23 @@
model = dict(
type='ImageClassifier',
backbone=dict(
type='MViT',
arch='large',
drop_path_rate=0.5,
dim_mul_in_attention=False),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
in_channels=1152,
num_classes=1000,
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,19 @@
model = dict(
type='ImageClassifier',
backbone=dict(type='MViT', arch='small', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
in_channels=768,
num_classes=1000,
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,19 @@
model = dict(
type='ImageClassifier',
backbone=dict(type='MViT', arch='tiny', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
in_channels=768,
num_classes=1000,
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,25 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(
type='SwinTransformerV2',
arch='base',
img_size=256,
drop_path_rate=0.5),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1024,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,17 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(
type='SwinTransformerV2',
arch='base',
img_size=384,
drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1024,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))

View File

@ -0,0 +1,16 @@
# model settings
# Only for evaluation
model = dict(
type='ImageClassifier',
backbone=dict(
type='SwinTransformerV2',
arch='large',
img_size=256,
drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1536,
loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
topk=(1, 5)))

View File

@ -0,0 +1,16 @@
# model settings
# Only for evaluation
model = dict(
type='ImageClassifier',
backbone=dict(
type='SwinTransformerV2',
arch='large',
img_size=384,
drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=1536,
loss=dict(type='CrossEntropyLoss', loss_weight=1.0),
topk=(1, 5)))

View File

@ -0,0 +1,25 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(
type='SwinTransformerV2',
arch='small',
img_size=256,
drop_path_rate=0.3),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,25 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(
type='SwinTransformerV2',
arch='tiny',
img_size=256,
drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='b0', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=256,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,21 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='b1', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))

View File

@ -0,0 +1,13 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='b2', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))

View File

@ -0,0 +1,13 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='b3', drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))

View File

@ -0,0 +1,13 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='b4', drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))

View File

@ -0,0 +1,13 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='b5', drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))

View File

@ -0,0 +1,13 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='b6', drop_path_rate=0.3),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=768,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))

View File

@ -1,13 +1 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='base', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))
_base_ = ['./van-b2.py']

View File

@ -1,13 +1 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='large', drop_path_rate=0.2),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False))
_base_ = ['./van-b3.py']

View File

@ -1,21 +1 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='small', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=512,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))
_base_ = ['./van-b1.py']

View File

@ -1,21 +1 @@
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(type='VAN', arch='tiny', drop_path_rate=0.1),
neck=dict(type='GlobalAveragePooling'),
head=dict(
type='LinearClsHead',
num_classes=1000,
in_channels=256,
init_cfg=None, # suppress the default init_cfg of LinearClsHead.
loss=dict(
type='LabelSmoothLoss', label_smooth_val=0.1, mode='original'),
cal_acc=False),
init_cfg=[
dict(type='TruncNormal', layer='Linear', std=0.02, bias=0.),
dict(type='Constant', layer='LayerNorm', val=1., bias=0.)
],
train_cfg=dict(augments=[
dict(type='BatchMixup', alpha=0.8, num_classes=1000, prob=0.5),
dict(type='BatchCutMix', alpha=1.0, num_classes=1000, prob=0.5)
]))
_base_ = ['./van-b0.py']

View File

@ -0,0 +1,7 @@
# optimizer
optimizer = dict(
type='SGD', lr=0.003, momentum=0.9, weight_decay=0.0005, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[40, 70, 90])
runner = dict(type='EpochBasedRunner', max_epochs=100)

View File

@ -0,0 +1,36 @@
# CSRA
> [Residual Attention: A Simple but Effective Method for Multi-Label Recognition](https://arxiv.org/abs/2108.02456)
<!-- [ALGORITHM] -->
## Abstract
Multi-label image recognition is a challenging computer vision task of practical use. Progresses in this area, however, are often characterized by complicated methods, heavy computations, and lack of intuitive explanations. To effectively capture different spatial regions occupied by objects from different categories, we propose an embarrassingly simple module, named class-specific residual attention (CSRA). CSRA generates class-specific features for every category by proposing a simple spatial attention score, and then combines it with the class-agnostic average pooling feature. CSRA achieves state-of-the-art results on multilabel recognition, and at the same time is much simpler than them. Furthermore, with only 4 lines of code, CSRA also leads to consistent improvement across many diverse pretrained models and datasets without any extra training. CSRA is both easy to implement and light in computations, which also enjoys intuitive explanations and visualizations.
<div align=center>
<img src="https://user-images.githubusercontent.com/84259897/176982245-3ffcff56-a4ea-4474-9967-bc2b612bbaa3.png" width="80%"/>
</div>
## Results and models
### VOC2007
| Model | Pretrain | Params(M) | Flops(G) | mAP | OF1 (%) | CF1 (%) | Config | Download |
| :------------: | :------------------------------------------------: | :-------: | :------: | :---: | :-----: | :-----: | :-----------------------------------------------: | :-------------------------------------------------: |
| Resnet101-CSRA | [ImageNet-1k](https://download.openmmlab.com/mmclassification/v0/resnet/resnet101_8xb32_in1k_20210831-539c63f8.pth) | 23.55 | 4.12 | 94.98 | 90.80 | 89.16 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/csra/resnet101-csra_1xb16_voc07-448px.py) | [model](https://download.openmmlab.com/mmclassification/v0/csra/resnet101-csra_1xb16_voc07-448px_20220722-29efb40a.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/csra/resnet101-csra_1xb16_voc07-448px_20220722-29efb40a.log.json) |
## Citation
```bibtex
@misc{https://doi.org/10.48550/arxiv.2108.02456,
doi = {10.48550/ARXIV.2108.02456},
url = {https://arxiv.org/abs/2108.02456},
author = {Zhu, Ke and Wu, Jianxin},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Residual Attention: A Simple but Effective Method for Multi-Label Recognition},
publisher = {arXiv},
year = {2021},
copyright = {arXiv.org perpetual, non-exclusive license}
}
```

View File

@ -0,0 +1,29 @@
Collections:
- Name: CSRA
Metadata:
Training Data: PASCAL VOC 2007
Architecture:
- Class-specific Residual Attention
Paper:
URL: https://arxiv.org/abs/1911.11929
Title: 'Residual Attention: A Simple but Effective Method for Multi-Label Recognition'
README: configs/csra/README.md
Code:
Version: v0.24.0
URL: https://github.com/open-mmlab/mmclassification/blob/v0.24.0/mmcls/models/heads/multi_label_csra_head.py
Models:
- Name: resnet101-csra_1xb16_voc07-448px
Metadata:
FLOPs: 4120000000
Parameters: 23550000
In Collections: CSRA
Results:
- Dataset: PASCAL VOC 2007
Metrics:
mAP: 94.98
OF1: 90.80
CF1: 89.16
Task: Multi-Label Classification
Weights: https://download.openmmlab.com/mmclassification/v0/csra/resnet101-csra_1xb16_voc07-448px_20220722-29efb40a.pth
Config: configs/csra/resnet101-csra_1xb16_voc07-448px.py

View File

@ -0,0 +1,75 @@
_base_ = ['../_base_/datasets/voc_bs16.py', '../_base_/default_runtime.py']
# Pre-trained Checkpoint Path
checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet101_8xb32_in1k_20210831-539c63f8.pth' # noqa
# If you want to use the pre-trained weight of ResNet101-CutMix from
# the originary repo(https://github.com/Kevinz-code/CSRA). Script of
# 'tools/convert_models/torchvision_to_mmcls.py' can help you convert weight
# into mmcls format. The mAP result would hit 95.5 by using the weight.
# checkpoint = 'PATH/TO/PRE-TRAINED_WEIGHT'
# model settings
model = dict(
type='ImageClassifier',
backbone=dict(
type='ResNet',
depth=101,
num_stages=4,
out_indices=(3, ),
style='pytorch',
init_cfg=dict(
type='Pretrained', checkpoint=checkpoint, prefix='backbone')),
neck=None,
head=dict(
type='CSRAClsHead',
num_classes=20,
in_channels=2048,
num_heads=1,
lam=0.1,
loss=dict(type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
# dataset setting
img_norm_cfg = dict(mean=[0, 0, 0], std=[255, 255, 255], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='RandomResizedCrop', size=448, scale=(0.7, 1.0)),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', size=448),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
# map the difficult examples as negative ones(0)
train=dict(pipeline=train_pipeline, difficult_as_postive=False),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
# optimizer
# the lr of classifier.head is 10 * base_lr, which help convergence.
optimizer = dict(
type='SGD',
lr=0.0002,
momentum=0.9,
weight_decay=0.0001,
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10)}))
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
step=6,
gamma=0.1,
warmup='linear',
warmup_iters=1,
warmup_ratio=1e-7,
warmup_by_epoch=True)
runner = dict(type='EpochBasedRunner', max_epochs=20)

View File

@ -0,0 +1,47 @@
# EfficientFormer
> [EfficientFormer: Vision Transformers at MobileNet Speed](https://arxiv.org/abs/2206.01191)
<!-- [ALGORITHM] -->
## Abstract
Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, e.g., attention mechanism, ViT-based models are generally times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance? To answer this, we first revisit the network architecture and operators used in ViT-based models and identify inefficient designs. Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm. Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer. Extensive experiments show the superiority of EfficientFormer in performance and speed on mobile devices. Our fastest model, EfficientFormer-L1, achieves 79.2% top-1 accuracy on ImageNet-1K with only 1.6 ms inference latency on iPhone 12 (compiled with CoreML), which runs as fast as MobileNetV2×1.4 (1.6 ms, 74.7% top-1), and our largest model, EfficientFormer-L7, obtains 83.3% accuracy with only 7.0 ms latency. Our work proves that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.
<div align=center>
<img src="https://user-images.githubusercontent.com/18586273/180713426-9d3d77e3-3584-42d8-9098-625b4170d796.png" width="100%"/>
</div>
## Results and models
### ImageNet-1k
| Model | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :------------------: | :-------: | :------: | :-------: | :-------: | :---------------------------------------------------------------------: | :------------------------------------------------------------------------: |
| EfficientFormer-l1\* | 12.19 | 1.30 | 80.46 | 94.99 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/efficientformer/efficientformer-l1_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/efficientformer/efficientformer-l1_3rdparty_in1k_20220803-d66e61df.pth) |
| EfficientFormer-l3\* | 31.41 | 3.93 | 82.45 | 96.18 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/efficientformer/efficientformer-l3_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/efficientformer/efficientformer-l3_3rdparty_in1k_20220803-dde1c8c5.pth) |
| EfficientFormer-l7\* | 82.23 | 10.16 | 83.40 | 96.60 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/efficientformer/efficientformer-l7_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/efficientformer/efficientformer-l7_3rdparty_in1k_20220803-41a552bb.pth) |
*Models with * are converted from the [official repo](https://github.com/snap-research/EfficientFormer). The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*
## Citation
```bibtex
@misc{https://doi.org/10.48550/arxiv.2206.01191,
doi = {10.48550/ARXIV.2206.01191},
url = {https://arxiv.org/abs/2206.01191},
author = {Li, Yanyu and Yuan, Geng and Wen, Yang and Hu, Eric and Evangelidis, Georgios and Tulyakov, Sergey and Wang, Yanzhi and Ren, Jian},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {EfficientFormer: Vision Transformers at MobileNet Speed},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
```

View File

@ -0,0 +1,24 @@
_base_ = [
'../_base_/datasets/imagenet_bs128_poolformer_small_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
model = dict(
type='ImageClassifier',
backbone=dict(
type='EfficientFormer',
arch='l1',
drop_path_rate=0,
init_cfg=[
dict(
type='TruncNormal',
layer=['Conv2d', 'Linear'],
std=.02,
bias=0.),
dict(type='Constant', layer=['GroupNorm'], val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-5)
]),
neck=dict(type='GlobalAveragePooling', dim=1),
head=dict(
type='EfficientFormerClsHead', in_channels=448, num_classes=1000))

View File

@ -0,0 +1,24 @@
_base_ = [
'../_base_/datasets/imagenet_bs128_poolformer_small_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
model = dict(
type='ImageClassifier',
backbone=dict(
type='EfficientFormer',
arch='l3',
drop_path_rate=0,
init_cfg=[
dict(
type='TruncNormal',
layer=['Conv2d', 'Linear'],
std=.02,
bias=0.),
dict(type='Constant', layer=['GroupNorm'], val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-5)
]),
neck=dict(type='GlobalAveragePooling', dim=1),
head=dict(
type='EfficientFormerClsHead', in_channels=512, num_classes=1000))

View File

@ -0,0 +1,24 @@
_base_ = [
'../_base_/datasets/imagenet_bs128_poolformer_small_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
model = dict(
type='ImageClassifier',
backbone=dict(
type='EfficientFormer',
arch='l7',
drop_path_rate=0,
init_cfg=[
dict(
type='TruncNormal',
layer=['Conv2d', 'Linear'],
std=.02,
bias=0.),
dict(type='Constant', layer=['GroupNorm'], val=1., bias=0.),
dict(type='Constant', layer=['LayerScale'], val=1e-5)
]),
neck=dict(type='GlobalAveragePooling', dim=1),
head=dict(
type='EfficientFormerClsHead', in_channels=768, num_classes=1000))

View File

@ -0,0 +1,67 @@
Collections:
- Name: EfficientFormer
Metadata:
Training Data: ImageNet-1k
Architecture:
- Pooling
- 1x1 Convolution
- LayerScale
- MetaFormer
Paper:
URL: https://arxiv.org/pdf/2206.01191.pdf
Title: "EfficientFormer: Vision Transformers at MobileNet Speed"
README: configs/efficientformer/README.md
Code:
Version: v0.24.0
URL: https://github.com/open-mmlab/mmclassification/blob/v0.24.0/mmcls/models/backbones/efficientformer.py
Models:
- Name: efficientformer-l1_3rdparty_8xb128_in1k
Metadata:
FLOPs: 1304601088 # 1.3G
Parameters: 12278696 # 12M
In Collections: EfficientFormer
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 80.46
Top 5 Accuracy: 94.99
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/efficientformer/efficientformer-l1_3rdparty_in1k_20220803-d66e61df.pth
Config: configs/efficientformer/efficientformer-l1_8xb128_in1k.py
Converted From:
Weights: https://drive.google.com/file/d/11SbX-3cfqTOc247xKYubrAjBiUmr818y/view?usp=sharing
Code: https://github.com/snap-research/EfficientFormer
- Name: efficientformer-l3_3rdparty_8xb128_in1k
Metadata:
Training Data: ImageNet-1k
FLOPs: 3737045760 # 3.7G
Parameters: 31406000 # 31M
In Collections: EfficientFormer
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 82.45
Top 5 Accuracy: 96.18
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/efficientformer/efficientformer-l3_3rdparty_in1k_20220803-dde1c8c5.pth
Config: configs/efficientformer/efficientformer-l3_8xb128_in1k.py
Converted From:
Weights: https://drive.google.com/file/d/1OyyjKKxDyMj-BcfInp4GlDdwLu3hc30m/view?usp=sharing
Code: https://github.com/snap-research/EfficientFormer
- Name: efficientformer-l7_3rdparty_8xb128_in1k
Metadata:
FLOPs: 10163951616 # 10.2G
Parameters: 82229328 # 82M
In Collections: EfficientFormer
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 83.40
Top 5 Accuracy: 96.60
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/efficientformer/efficientformer-l7_3rdparty_in1k_20220803-41a552bb.pth
Config: configs/efficientformer/efficientformer-l7_8xb128_in1k.py
Converted From:
Weights: https://drive.google.com/file/d/1cVw-pctJwgvGafeouynqWWCwgkcoFMM5/view?usp=sharing
Code: https://github.com/snap-research/EfficientFormer

View File

@ -0,0 +1,51 @@
# HorNet
> [HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions](https://arxiv.org/pdf/2207.14284v2.pdf)
<!-- [ALGORITHM] -->
## Abstract
Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution (g nConv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. g nConv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and a larger model size. Apart from the effectiveness in visual encoders, we also show g nConv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that g nConv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet.
<div align=center>
<img src="https://user-images.githubusercontent.com/24734142/188356236-b8e3db94-eaa6-48e9-b323-15e5ba7f2991.png" width="80%"/>
</div>
## Results and models
### ImageNet-1k
| Model | Pretrain | resolution | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :-----------: | :----------: | :--------: | :-------: | :------: | :-------: | :-------: | :--------------------------------------------------------------: | :----------------------------------------------------------------: |
| HorNet-T\* | From scratch | 224x224 | 22.41 | 3.98 | 82.84 | 96.24 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hornet/hornet-tiny_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-tiny_3rdparty_in1k_20220915-0e8eedff.pth) |
| HorNet-T-GF\* | From scratch | 224x224 | 22.99 | 3.9 | 82.98 | 96.38 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hornet/hornet-tiny-gf_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-tiny-gf_3rdparty_in1k_20220915-4c35a66b.pth) |
| HorNet-S\* | From scratch | 224x224 | 49.53 | 8.83 | 83.79 | 96.75 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hornet/hornet-small_8xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-small_3rdparty_in1k_20220915-5935f60f.pth) |
| HorNet-S-GF\* | From scratch | 224x224 | 50.4 | 8.71 | 83.98 | 96.77 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hornet/hornet-small-gf_8xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-small-gf_3rdparty_in1k_20220915-649ca492.pth) |
| HorNet-B\* | From scratch | 224x224 | 87.26 | 15.59 | 84.24 | 96.94 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hornet/hornet-base_8xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-base_3rdparty_in1k_20220915-a06176bb.pth) |
| HorNet-B-GF\* | From scratch | 224x224 | 88.42 | 15.42 | 84.32 | 96.95 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/hornet/hornet-base-gf_8xb64_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-base-gf_3rdparty_in1k_20220915-82c06fa7.pth) |
\*Models with * are converted from [the official repo](https://github.com/raoyongming/HorNet). The config files of these models are only for validation. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.
### Pre-trained Models
The pre-trained models on ImageNet-21k are used to fine-tune on the downstream tasks.
| Model | Pretrain | resolution | Params(M) | Flops(G) | Download |
| :--------------: | :----------: | :--------: | :-------: | :------: | :------------------------------------------------------------------------------------------------------------------------: |
| HorNet-L\* | ImageNet-21k | 224x224 | 194.54 | 34.83 | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-large_3rdparty_in21k_20220909-9ccef421.pth) |
| HorNet-L-GF\* | ImageNet-21k | 224x224 | 196.29 | 34.58 | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-large-gf_3rdparty_in21k_20220909-3aea3b61.pth) |
| HorNet-L-GF384\* | ImageNet-21k | 384x384 | 201.23 | 101.63 | [model](https://download.openmmlab.com/mmclassification/v0/hornet/hornet-large-gf384_3rdparty_in21k_20220909-80894290.pth) |
\*Models with * are converted from [the official repo](https://github.com/raoyongming/HorNet).
## Citation
```
@article{rao2022hornet,
title={HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions},
author={Rao, Yongming and Zhao, Wenliang and Tang, Yansong and Zhou, Jie and Lim, Ser-Lam and Lu, Jiwen},
journal={arXiv preprint arXiv:2207.14284},
year={2022}
}
```

View File

@ -0,0 +1,13 @@
_base_ = [
'../_base_/models/hornet/hornet-base-gf.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
data = dict(samples_per_gpu=64)
optimizer = dict(lr=4e-3)
optimizer_config = dict(grad_clip=dict(max_norm=1.0), _delete_=True)
custom_hooks = [dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL')]

View File

@ -0,0 +1,13 @@
_base_ = [
'../_base_/models/hornet/hornet-base.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
data = dict(samples_per_gpu=64)
optimizer = dict(lr=4e-3)
optimizer_config = dict(grad_clip=dict(max_norm=5.0), _delete_=True)
custom_hooks = [dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL')]

View File

@ -0,0 +1,13 @@
_base_ = [
'../_base_/models/hornet/hornet-small-gf.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
data = dict(samples_per_gpu=64)
optimizer = dict(lr=4e-3)
optimizer_config = dict(grad_clip=dict(max_norm=1.0), _delete_=True)
custom_hooks = [dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL')]

View File

@ -0,0 +1,13 @@
_base_ = [
'../_base_/models/hornet/hornet-small.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
data = dict(samples_per_gpu=64)
optimizer = dict(lr=4e-3)
optimizer_config = dict(grad_clip=dict(max_norm=5.0), _delete_=True)
custom_hooks = [dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL')]

View File

@ -0,0 +1,13 @@
_base_ = [
'../_base_/models/hornet/hornet-tiny-gf.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
data = dict(samples_per_gpu=128)
optimizer = dict(lr=4e-3)
optimizer_config = dict(grad_clip=dict(max_norm=1.0), _delete_=True)
custom_hooks = [dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL')]

View File

@ -0,0 +1,13 @@
_base_ = [
'../_base_/models/hornet/hornet-tiny.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py',
]
data = dict(samples_per_gpu=128)
optimizer = dict(lr=4e-3)
optimizer_config = dict(grad_clip=dict(max_norm=100.0), _delete_=True)
custom_hooks = [dict(type='EMAHook', momentum=4e-5, priority='ABOVE_NORMAL')]

View File

@ -0,0 +1,97 @@
Collections:
- Name: HorNet
Metadata:
Training Data: ImageNet-1k
Training Techniques:
- AdamW
- Weight Decay
Architecture:
- HorNet
- gnConv
Paper:
URL: https://arxiv.org/pdf/2207.14284v2.pdf
Title: "HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions"
README: configs/hornet/README.md
Code:
Version: v0.24.0
URL: https://github.com/open-mmlab/mmclassification/blob/v0.24.0/mmcls/models/backbones/hornet.py
Models:
- Name: hornet-tiny_3rdparty_in1k
Metadata:
FLOPs: 3980000000 # 3.98G
Parameters: 22410000 # 22.41M
In Collection: HorNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 82.84
Top 5 Accuracy: 96.24
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/hornet/hornet-tiny_3rdparty_in1k_20220915-0e8eedff.pth
Config: configs/hornet/hornet-tiny_8xb128_in1k.py
- Name: hornet-tiny-gf_3rdparty_in1k
Metadata:
FLOPs: 3900000000 # 3.9G
Parameters: 22990000 # 22.99M
In Collection: HorNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 82.98
Top 5 Accuracy: 96.38
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/hornet/hornet-tiny-gf_3rdparty_in1k_20220915-4c35a66b.pth
Config: configs/hornet/hornet-tiny-gf_8xb128_in1k.py
- Name: hornet-small_3rdparty_in1k
Metadata:
FLOPs: 8830000000 # 8.83G
Parameters: 49530000 # 49.53M
In Collection: HorNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 83.79
Top 5 Accuracy: 96.75
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/hornet/hornet-small_3rdparty_in1k_20220915-5935f60f.pth
Config: configs/hornet/hornet-small_8xb64_in1k.py
- Name: hornet-small-gf_3rdparty_in1k
Metadata:
FLOPs: 8710000000 # 8.71G
Parameters: 50400000 # 50.4M
In Collection: HorNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 83.98
Top 5 Accuracy: 96.77
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/hornet/hornet-small-gf_3rdparty_in1k_20220915-649ca492.pth
Config: configs/hornet/hornet-small-gf_8xb64_in1k.py
- Name: hornet-base_3rdparty_in1k
Metadata:
FLOPs: 15590000000 # 15.59G
Parameters: 87260000 # 87.26M
In Collection: HorNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 84.24
Top 5 Accuracy: 96.94
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/hornet/hornet-base_3rdparty_in1k_20220915-a06176bb.pth
Config: configs/hornet/hornet-base_8xb64_in1k.py
- Name: hornet-base-gf_3rdparty_in1k
Metadata:
FLOPs: 15420000000 # 15.42G
Parameters: 88420000 # 88.42M
In Collection: HorNet
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 84.32
Top 5 Accuracy: 96.95
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/hornet/hornet-base-gf_3rdparty_in1k_20220915-82c06fa7.pth
Config: configs/hornet/hornet-base-gf_8xb64_in1k.py

View File

@ -1,5 +1,5 @@
_base_ = [
'../_base_/models/mobilenet-v3-small_8xb16_cifar.py',
'../_base_/models/mobilenet-v3-small_cifar.py',
'../_base_/datasets/cifar10_bs16.py',
'../_base_/schedules/cifar10_bs128.py', '../_base_/default_runtime.py'
]

View File

@ -0,0 +1,44 @@
# MViT V2
> [MViTv2: Improved Multiscale Vision Transformers for Classification and Detection](http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf)
<!-- [ALGORITHM] -->
## Abstract
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video
classification, as well as object detection. We present an improved version of MViT that incorporates
decomposed relative positional embeddings and residual pooling connections. We instantiate this architecture
in five sizes and evaluate it for ImageNet classification, COCO detection and Kinetics video recognition where
it outperforms prior work. We further compare MViTv2s' pooling attention to window attention mechanisms where
it outperforms the latter in accuracy/compute. Without bells-and-whistles, MViTv2 has state-of-the-art
performance in 3 domains: 88.8% accuracy on ImageNet classification, 58.7 boxAP on COCO object detection as
well as 86.1% on Kinetics-400 video classification.
<div align=center>
<img src="https://user-images.githubusercontent.com/26739999/180376227-755243fa-158e-4068-940a-416036519665.png" width="50%"/>
</div>
## Results and models
### ImageNet-1k
| Model | Pretrain | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :------------: | :----------: | :-------: | :------: | :-------: | :-------: | :------------------------------------------------------------------: | :---------------------------------------------------------------------: |
| MViTv2-tiny\* | From scratch | 24.17 | 4.70 | 82.33 | 96.15 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-tiny_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-tiny_3rdparty_in1k_20220722-db7beeef.pth) |
| MViTv2-small\* | From scratch | 34.87 | 7.00 | 83.63 | 96.51 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-small_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-small_3rdparty_in1k_20220722-986bd741.pth) |
| MViTv2-base\* | From scratch | 51.47 | 10.20 | 84.34 | 96.86 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-base_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-base_3rdparty_in1k_20220722-9c4f0a17.pth) |
| MViTv2-large\* | From scratch | 217.99 | 42.10 | 85.25 | 97.14 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/mvit/mvitv2-large_8xb256_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-large_3rdparty_in1k_20220722-2b57b983.pth) |
*Models with * are converted from the [official repo](https://github.com/facebookresearch/mvit). The config files of these models are only for inference. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*
## Citation
```bibtex
@inproceedings{li2021improved,
title={MViTv2: Improved multiscale vision transformers for classification and detection},
author={Li, Yanghao and Wu, Chao-Yuan and Fan, Haoqi and Mangalam, Karttikeya and Xiong, Bo and Malik, Jitendra and Feichtenhofer, Christoph},
booktitle={CVPR},
year={2022}
}
```

View File

@ -0,0 +1,95 @@
Collections:
- Name: MViT V2
Metadata:
Architecture:
- Attention Dropout
- Convolution
- Dense Connections
- GELU
- Layer Normalization
- Scaled Dot-Product Attention
- Attention Pooling
Paper:
URL: http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf
Title: 'MViTv2: Improved Multiscale Vision Transformers for Classification and Detection'
README: configs/mvit/README.md
Code:
URL: https://github.com/open-mmlab/mmclassification/blob/v0.24.0/mmcls/models/backbones/mvit.py
Version: v0.24.0
Models:
- Name: mvitv2-tiny_3rdparty_in1k
In Collection: MViT V2
Metadata:
FLOPs: 4700000000
Parameters: 24173320
Training Data:
- ImageNet-1k
Results:
- Dataset: ImageNet-1k
Task: Image Classification
Metrics:
Top 1 Accuracy: 82.33
Top 5 Accuracy: 96.15
Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-tiny_3rdparty_in1k_20220722-db7beeef.pth
Converted From:
Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_T_in1k.pyth
Code: https://github.com/facebookresearch/mvit
Config: configs/mvit/mvitv2-tiny_8xb256_in1k.py
- Name: mvitv2-small_3rdparty_in1k
In Collection: MViT V2
Metadata:
FLOPs: 7000000000
Parameters: 34870216
Training Data:
- ImageNet-1k
Results:
- Dataset: ImageNet-1k
Task: Image Classification
Metrics:
Top 1 Accuracy: 83.63
Top 5 Accuracy: 96.51
Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-small_3rdparty_in1k_20220722-986bd741.pth
Converted From:
Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_S_in1k.pyth
Code: https://github.com/facebookresearch/mvit
Config: configs/mvit/mvitv2-small_8xb256_in1k.py
- Name: mvitv2-base_3rdparty_in1k
In Collection: MViT V2
Metadata:
FLOPs: 10200000000
Parameters: 51472744
Training Data:
- ImageNet-1k
Results:
- Dataset: ImageNet-1k
Task: Image Classification
Metrics:
Top 1 Accuracy: 84.34
Top 5 Accuracy: 96.86
Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-base_3rdparty_in1k_20220722-9c4f0a17.pth
Converted From:
Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_B_in1k.pyth
Code: https://github.com/facebookresearch/mvit
Config: configs/mvit/mvitv2-base_8xb256_in1k.py
- Name: mvitv2-large_3rdparty_in1k
In Collection: MViT V2
Metadata:
FLOPs: 42100000000
Parameters: 217992952
Training Data:
- ImageNet-1k
Results:
- Dataset: ImageNet-1k
Task: Image Classification
Metrics:
Top 1 Accuracy: 85.25
Top 5 Accuracy: 97.14
Weights: https://download.openmmlab.com/mmclassification/v0/mvit/mvitv2-large_3rdparty_in1k_20220722-2b57b983.pth
Converted From:
Weights: https://dl.fbaipublicfiles.com/mvit/mvitv2_models/MViTv2_L_in1k.pyth
Code: https://github.com/facebookresearch/mvit
Config: configs/mvit/mvitv2-large_8xb256_in1k.py

View File

@ -0,0 +1,29 @@
_base_ = [
'../_base_/models/mvit/mvitv2-base.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
# dataset settings
data = dict(samples_per_gpu=256)
# schedule settings
paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys={
'.pos_embed': dict(decay_mult=0.0),
'.rel_pos_h': dict(decay_mult=0.0),
'.rel_pos_w': dict(decay_mult=0.0)
})
optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
optimizer_config = dict(grad_clip=dict(max_norm=1.0))
# learning policy
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=70,
warmup_by_epoch=True)

View File

@ -0,0 +1,29 @@
_base_ = [
'../_base_/models/mvit/mvitv2-large.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs2048_AdamW.py',
'../_base_/default_runtime.py'
]
# dataset settings
data = dict(samples_per_gpu=256)
# schedule settings
paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys={
'.pos_embed': dict(decay_mult=0.0),
'.rel_pos_h': dict(decay_mult=0.0),
'.rel_pos_w': dict(decay_mult=0.0)
})
optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
optimizer_config = dict(grad_clip=dict(max_norm=1.0))
# learning policy
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=70,
warmup_by_epoch=True)

View File

@ -0,0 +1,29 @@
_base_ = [
'../_base_/models/mvit/mvitv2-small.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs2048_AdamW.py',
'../_base_/default_runtime.py'
]
# dataset settings
data = dict(samples_per_gpu=256)
# schedule settings
paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys={
'.pos_embed': dict(decay_mult=0.0),
'.rel_pos_h': dict(decay_mult=0.0),
'.rel_pos_w': dict(decay_mult=0.0)
})
optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
optimizer_config = dict(grad_clip=dict(max_norm=1.0))
# learning policy
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=70,
warmup_by_epoch=True)

View File

@ -0,0 +1,29 @@
_base_ = [
'../_base_/models/mvit/mvitv2-tiny.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs2048_AdamW.py',
'../_base_/default_runtime.py'
]
# dataset settings
data = dict(samples_per_gpu=256)
# schedule settings
paramwise_cfg = dict(
norm_decay_mult=0.0,
bias_decay_mult=0.0,
custom_keys={
'.pos_embed': dict(decay_mult=0.0),
'.rel_pos_h': dict(decay_mult=0.0),
'.rel_pos_w': dict(decay_mult=0.0)
})
optimizer = dict(lr=0.00025, paramwise_cfg=paramwise_cfg)
optimizer_config = dict(grad_clip=dict(max_norm=1.0))
# learning policy
lr_config = dict(
policy='CosineAnnealing',
warmup='linear',
warmup_iters=70,
warmup_by_epoch=True)

View File

@ -72,6 +72,12 @@ The pre-trained models on ImageNet-21k are used to fine-tune, and therefore don'
| :-------: | :--------------------------------------------------: | :--------: | :-------: | :------: | :-------: | :------------------------------------------------: | :---------------------------------------------------: |
| ResNet-50 | [ImageNet-21k-mill](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_3rdparty-mill_in21k_20220331-faac000b.pth) | 448x448 | 23.92 | 16.48 | 88.45 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet50_8xb8_cub.py) | [model](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb8_cub_20220307-57840e60.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb8_cub_20220307-57840e60.log.json) |
### Stanford-Cars
| Model | Pretrain | resolution | Params(M) | Flops(G) | Top-1 (%) | Config | Download |
| :-------: | :--------------------------------------------------: | :--------: | :-------: | :------: | :-------: | :------------------------------------------------: | :---------------------------------------------------: |
| ResNet-50 | [ImageNet-21k-mill](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_3rdparty-mill_in21k_20220331-faac000b.pth) | 448x448 | 23.92 | 16.48 | 92.82 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/resnet/resnet50_8xb8_cars.py) | [model](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb8_cars_20220812-9d85901a.pth) \| [log](https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb8_cars_20220812-9d85901a.log.json) |
## Citation
```

View File

@ -298,44 +298,6 @@ Models:
Top 5 Accuracy: 93.80
Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb256-rsb-a3-100e_in1k_20211228-3493673c.pth
Config: configs/resnet/resnet50_8xb256-rsb-a3-100e_in1k.py
- Name: wide-resnet50_3rdparty_8xb32_in1k
Metadata:
FLOPs: 11440000000 # 11.44G
Parameters: 68880000 # 68.88M
Training Techniques:
- SGD with Momentum
- Weight Decay
In Collection: ResNet
Results:
- Task: Image Classification
Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 78.48
Top 5 Accuracy: 94.08
Weights: https://download.openmmlab.com/mmclassification/v0/resnet/wide-resnet50_3rdparty_8xb32_in1k_20220304-66678344.pth
Config: configs/resnet/wide-resnet50_8xb32_in1k.py
Converted From:
Weights: https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth
Code: https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py
- Name: wide-resnet101_3rdparty_8xb32_in1k
Metadata:
FLOPs: 22810000000 # 22.81G
Parameters: 126890000 # 126.89M
Training Techniques:
- SGD with Momentum
- Weight Decay
In Collection: ResNet
Results:
- Task: Image Classification
Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 78.84
Top 5 Accuracy: 94.28
Weights: https://download.openmmlab.com/mmclassification/v0/resnet/wide-resnet101_3rdparty_8xb32_in1k_20220304-8d5f9d61.pth
Config: configs/resnet/wide-resnet101_8xb32_in1k.py
Converted From:
Weights: https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth
Code: https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py
- Name: resnetv1c50_8xb32_in1k
Metadata:
FLOPs: 4360000000
@ -388,3 +350,16 @@ Models:
Pretrain: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_3rdparty-mill_in21k_20220331-faac000b.pth
Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb8_cub_20220307-57840e60.pth
Config: configs/resnet/resnet50_8xb8_cub.py
- Name: resnet50_8xb8_cars
Metadata:
FLOPs: 16480000000
Parameters: 23920000
In Collection: ResNet
Results:
- Dataset: StanfordCars
Metrics:
Top 1 Accuracy: 92.82
Task: Image Classification
Pretrain: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_3rdparty-mill_in21k_20220331-faac000b.pth
Weights: https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb8_cars_20220812-9d85901a.pth
Config: configs/resnet/resnet50_8xb8_cars.py

View File

@ -0,0 +1,19 @@
_base_ = [
'../_base_/models/resnet50.py',
'../_base_/datasets/stanford_cars_bs8_448.py',
'../_base_/schedules/stanford_cars_bs8.py', '../_base_/default_runtime.py'
]
# use pre-train weight converted from https://github.com/Alibaba-MIIL/ImageNet21K # noqa
checkpoint = 'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_3rdparty-mill_in21k_20220331-faac000b.pth' # noqa
model = dict(
type='ImageClassifier',
backbone=dict(
init_cfg=dict(
type='Pretrained', checkpoint=checkpoint, prefix='backbone')),
head=dict(num_classes=196, ))
log_config = dict(interval=50)
checkpoint_config = dict(
interval=1, max_keep_ckpts=3) # save last three checkpoints

View File

@ -0,0 +1,58 @@
# Swin Transformer V2
> [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883.pdf)
<!-- [ALGORITHM] -->
## Abstract
Large-scale NLP models have been shown to significantly improve the performance on language tasks with no signs of saturation. They also demonstrate amazing few-shot capabilities like that of human beings. This paper aims to explore large-scale models in computer vision. We tackle three major issues in training and application of large vision models, including training instability, resolution gaps between pre-training and fine-tuning, and hunger on labelled data. Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images. Through these techniques, this paper successfully trained a 3 billion-parameter Swin Transformer V2 model, which is the largest dense vision model to date, and makes it capable of training with images of up to 1,536×1,536 resolution. It set new performance records on 4 representative vision tasks, including ImageNet-V2 image classification, COCO object detection, ADE20K semantic segmentation, and Kinetics-400 video action classification. Also note our training is much more efficient than that in Google's billion-level visual models, which consumes 40 times less labelled data and 40 times less training time.
<div align=center>
<img src="https://user-images.githubusercontent.com/42952108/180748696-ee7ed23d-7fee-4ccf-9eb5-f117db228a42.png" width="100%"/>
</div>
## Results and models
### ImageNet-21k
The pre-trained models on ImageNet-21k are used to fine-tune, and therefore don't have evaluation results.
| Model | resolution | Params(M) | Flops(G) | Download |
| :------: | :--------: | :-------: | :------: | :--------------------------------------------------------------------------------------------------------------------------------------: |
| Swin-B\* | 192x192 | 87.92 | 8.51 | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/pretrain/swinv2-base-w12_3rdparty_in21k-192px_20220803-f7dc9763.pth) |
| Swin-L\* | 192x192 | 196.74 | 19.04 | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/pretrain/swinv2-large-w12_3rdparty_in21k-192px_20220803-d9073fee.pth) |
### ImageNet-1k
| Model | Pretrain | resolution | window | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :------: | :----------: | :--------: | :----: | :-------: | :------: | :-------: | :-------: | :-------------------------------------------------------------: | :----------------------------------------------------------------: |
| Swin-T\* | From scratch | 256x256 | 8x8 | 28.35 | 4.35 | 81.76 | 95.87 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-tiny-w8_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-tiny-w8_3rdparty_in1k-256px_20220803-e318968f.pth) |
| Swin-T\* | From scratch | 256x256 | 16x16 | 28.35 | 4.4 | 82.81 | 96.23 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-tiny-w16_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-tiny-w16_3rdparty_in1k-256px_20220803-9651cdd7.pth) |
| Swin-S\* | From scratch | 256x256 | 8x8 | 49.73 | 8.45 | 83.74 | 96.6 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-small-w8_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-small-w8_3rdparty_in1k-256px_20220803-b01a4332.pth) |
| Swin-S\* | From scratch | 256x256 | 16x16 | 49.73 | 8.57 | 84.13 | 96.83 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-small-w16_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-small-w16_3rdparty_in1k-256px_20220803-b707d206.pth) |
| Swin-B\* | From scratch | 256x256 | 8x8 | 87.92 | 14.99 | 84.2 | 96.86 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-base-w8_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w8_3rdparty_in1k-256px_20220803-8ff28f2b.pth) |
| Swin-B\* | From scratch | 256x256 | 16x16 | 87.92 | 15.14 | 84.6 | 97.05 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-base-w16_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w16_3rdparty_in1k-256px_20220803-5a1886b7.pth) |
| Swin-B\* | ImageNet-21k | 256x256 | 16x16 | 87.92 | 15.14 | 86.17 | 97.88 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-base-w16_in21k-pre_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w16_in21k-pre_3rdparty_in1k-256px_20220803-8d7aa8ad.pth) |
| Swin-B\* | ImageNet-21k | 384x384 | 24x24 | 87.92 | 34.07 | 87.14 | 98.23 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-base-w24_in21k-pre_16xb64_in1k-384px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w24_in21k-pre_3rdparty_in1k-384px_20220803-44eb70f8.pth) |
| Swin-L\* | ImageNet-21k | 256X256 | 16x16 | 196.75 | 33.86 | 86.93 | 98.06 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-large-w16_in21k-pre_16xb64_in1k-256px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-large-w16_in21k-pre_3rdparty_in1k-256px_20220803-c40cbed7.pth) |
| Swin-L\* | ImageNet-21k | 384x384 | 24x24 | 196.75 | 76.2 | 87.59 | 98.27 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/swin_transformer_v2/swinv2-large-w24_in21k-pre_16xb64_in1k-384px.py) | [model](https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-large-w24_in21k-pre_3rdparty_in1k-384px_20220803-3b36c165.pth) |
*Models with * are converted from the [official repo](https://github.com/microsoft/Swin-Transformer#main-results-on-imagenet-with-pretrained-models). The config files of these models are only for validation. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.*
*ImageNet-21k pretrained models with input resolution of 256x256 and 384x384 both fine-tuned from the same pre-training model using a smaller input resolution of 192x192.*
## Citation
```
@article{https://doi.org/10.48550/arxiv.2111.09883,
doi = {10.48550/ARXIV.2111.09883},
url = {https://arxiv.org/abs/2111.09883},
author = {Liu, Ze and Hu, Han and Lin, Yutong and Yao, Zhuliang and Xie, Zhenda and Wei, Yixuan and Ning, Jia and Cao, Yue and Zhang, Zheng and Dong, Li and Wei, Furu and Guo, Baining},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Swin Transformer V2: Scaling Up Capacity and Resolution},
publisher = {arXiv},
year = {2021},
copyright = {Creative Commons Attribution 4.0 International}
}
```

View File

@ -0,0 +1,204 @@
Collections:
- Name: Swin-Transformer-V2
Metadata:
Training Data: ImageNet-1k
Training Techniques:
- AdamW
- Weight Decay
Training Resources: 16x V100 GPUs
Epochs: 300
Batch Size: 1024
Architecture:
- Shift Window Multihead Self Attention
Paper:
URL: https://arxiv.org/abs/2111.09883.pdf
Title: "Swin Transformer V2: Scaling Up Capacity and Resolution"
README: configs/swin_transformer_v2/README.md
Models:
- Name: swinv2-tiny-w8_3rdparty_in1k-256px
Metadata:
FLOPs: 4350000000
Parameters: 28350000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 81.76
Top 5 Accuracy: 95.87
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-tiny-w8_3rdparty_in1k-256px_20220803-e318968f.pth
Config: configs/swin_transformer_v2/swinv2-tiny-w8_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_tiny_patch4_window8_256.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-tiny-w16_3rdparty_in1k-256px
Metadata:
FLOPs: 4400000000
Parameters: 28350000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 82.81
Top 5 Accuracy: 96.23
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-tiny-w16_3rdparty_in1k-256px_20220803-9651cdd7.pth
Config: configs/swin_transformer_v2/swinv2-tiny-w16_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_tiny_patch4_window16_256.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-small-w8_3rdparty_in1k-256px
Metadata:
FLOPs: 8450000000
Parameters: 49730000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 83.74
Top 5 Accuracy: 96.6
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-small-w8_3rdparty_in1k-256px_20220803-b01a4332.pth
Config: configs/swin_transformer_v2/swinv2-small-w8_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_small_patch4_window8_256.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-small-w16_3rdparty_in1k-256px
Metadata:
FLOPs: 8570000000
Parameters: 49730000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 84.13
Top 5 Accuracy: 96.83
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-small-w16_3rdparty_in1k-256px_20220803-b707d206.pth
Config: configs/swin_transformer_v2/swinv2-small-w16_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_small_patch4_window16_256.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-base-w8_3rdparty_in1k-256px
Metadata:
FLOPs: 14990000000
Parameters: 87920000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 84.2
Top 5 Accuracy: 96.86
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w8_3rdparty_in1k-256px_20220803-8ff28f2b.pth
Config: configs/swin_transformer_v2/swinv2-base-w8_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_base_patch4_window8_256.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-base-w16_3rdparty_in1k-256px
Metadata:
FLOPs: 15140000000
Parameters: 87920000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 84.6
Top 5 Accuracy: 97.05
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w16_3rdparty_in1k-256px_20220803-5a1886b7.pth
Config: configs/swin_transformer_v2/swinv2-base-w16_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_base_patch4_window16_256.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-base-w16_in21k-pre_3rdparty_in1k-256px
Metadata:
Training Data: ImageNet-21k
FLOPs: 15140000000
Parameters: 87920000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 86.17
Top 5 Accuracy: 97.88
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w16_in21k-pre_3rdparty_in1k-256px_20220803-8d7aa8ad.pth
Config: configs/swin_transformer_v2/swinv2-base-w16_in21k-pre_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_base_patch4_window12to16_192to256_22kto1k_ft.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-base-w24_in21k-pre_3rdparty_in1k-384px
Metadata:
Training Data: ImageNet-21k
FLOPs: 34070000000
Parameters: 87920000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 87.14
Top 5 Accuracy: 98.23
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-base-w24_in21k-pre_3rdparty_in1k-384px_20220803-44eb70f8.pth
Config: configs/swin_transformer_v2/swinv2-base-w24_in21k-pre_16xb64_in1k-384px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_base_patch4_window12to24_192to384_22kto1k_ft.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-large-w16_in21k-pre_3rdparty_in1k-256px
Metadata:
Training Data: ImageNet-21k
FLOPs: 33860000000
Parameters: 196750000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 86.93
Top 5 Accuracy: 98.06
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-large-w16_in21k-pre_3rdparty_in1k-256px_20220803-c40cbed7.pth
Config: configs/swin_transformer_v2/swinv2-large-w16_in21k-pre_16xb64_in1k-256px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12to16_192to256_22kto1k_ft.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-large-w24_in21k-pre_3rdparty_in1k-384px
Metadata:
Training Data: ImageNet-21k
FLOPs: 76200000000
Parameters: 196750000
In Collection: Swin-Transformer-V2
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 87.59
Top 5 Accuracy: 98.27
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/swinv2-large-w24_in21k-pre_3rdparty_in1k-384px_20220803-3b36c165.pth
Config: configs/swin_transformer_v2/swinv2-large-w24_in21k-pre_16xb64_in1k-384px.py
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12to24_192to384_22kto1k_ft.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-base-w12_3rdparty_in21k-192px
Metadata:
Training Data: ImageNet-21k
FLOPs: 8510000000
Parameters: 87920000
In Collections: Swin-Transformer-V2
Results: null
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/pretrain/swinv2-base-w12_3rdparty_in21k-192px_20220803-f7dc9763.pth
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_base_patch4_window12_192_22k.pth
Code: https://github.com/microsoft/Swin-Transformer
- Name: swinv2-large-w12_3rdparty_in21k-192px
Metadata:
Training Data: ImageNet-21k
FLOPs: 19040000000
Parameters: 196740000
In Collections: Swin-Transformer-V2
Results: null
Weights: https://download.openmmlab.com/mmclassification/v0/swin-v2/pretrain/swinv2-large-w12_3rdparty_in21k-192px_20220803-d9073fee.pth
Converted From:
Weights: https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12_192_22k.pth
Code: https://github.com/microsoft/Swin-Transformer

View File

@ -0,0 +1,8 @@
_base_ = [
'../_base_/models/swin_transformer_v2/base_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
model = dict(backbone=dict(window_size=[16, 16, 16, 8]))

View File

@ -0,0 +1,13 @@
_base_ = [
'../_base_/models/swin_transformer_v2/base_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
model = dict(
type='ImageClassifier',
backbone=dict(
window_size=[16, 16, 16, 8],
drop_path_rate=0.2,
pretrained_window_sizes=[12, 12, 12, 6]))

View File

@ -0,0 +1,14 @@
_base_ = [
'../_base_/models/swin_transformer_v2/base_384.py',
'../_base_/datasets/imagenet_bs64_swin_384.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
model = dict(
type='ImageClassifier',
backbone=dict(
img_size=384,
window_size=[24, 24, 24, 12],
drop_path_rate=0.2,
pretrained_window_sizes=[12, 12, 12, 6]))

View File

@ -0,0 +1,6 @@
_base_ = [
'../_base_/models/swin_transformer_v2/base_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]

View File

@ -0,0 +1,13 @@
# Only for evaluation
_base_ = [
'../_base_/models/swin_transformer_v2/large_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
model = dict(
type='ImageClassifier',
backbone=dict(
window_size=[16, 16, 16, 8], pretrained_window_sizes=[12, 12, 12, 6]),
)

View File

@ -0,0 +1,15 @@
# Only for evaluation
_base_ = [
'../_base_/models/swin_transformer_v2/large_384.py',
'../_base_/datasets/imagenet_bs64_swin_384.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
model = dict(
type='ImageClassifier',
backbone=dict(
img_size=384,
window_size=[24, 24, 24, 12],
pretrained_window_sizes=[12, 12, 12, 6]),
)

View File

@ -0,0 +1,8 @@
_base_ = [
'../_base_/models/swin_transformer_v2/small_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
model = dict(backbone=dict(window_size=[16, 16, 16, 8]))

View File

@ -0,0 +1,6 @@
_base_ = [
'../_base_/models/swin_transformer_v2/small_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]

View File

@ -0,0 +1,8 @@
_base_ = [
'../_base_/models/swin_transformer_v2/tiny_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
model = dict(backbone=dict(window_size=[16, 16, 16, 8]))

View File

@ -0,0 +1,6 @@
_base_ = [
'../_base_/models/swin_transformer_v2/tiny_256.py',
'../_base_/datasets/imagenet_bs64_swin_256.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]

View File

@ -6,7 +6,7 @@
## Abstract
Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, \\eg, the Vision Transformer (ViT) for image classification. The ViT model splits each image into a sequence of tokens with fixed length and then applies multiple Transformer layers to model their global relation for classification. However, ViT achieves inferior performance to CNNs when trained from scratch on a midsize dataset like ImageNet. We find it is because: 1) the simple tokenization of input images fails to model the important local structure such as edges and lines among neighboring pixels, leading to low training sample efficiency; 2) the redundant attention backbone design of ViT leads to limited feature richness for fixed computation budgets and limited training samples. To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study. Notably, T2T-ViT reduces the parameter count and MACs of vanilla ViT by half, while achieving more than 3.0% improvement when trained from scratch on ImageNet. It also outperforms ResNets and achieves comparable performance with MobileNets by directly training on ImageNet. For example, T2T-ViT with comparable size to ResNet50 (21.5M parameters) can achieve 83.3% top1 accuracy in image resolution 384×384 on ImageNet.
Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, e.g., the Vision Transformer (ViT) for image classification. The ViT model splits each image into a sequence of tokens with fixed length and then applies multiple Transformer layers to model their global relation for classification. However, ViT achieves inferior performance to CNNs when trained from scratch on a midsize dataset like ImageNet. We find it is because: 1) the simple tokenization of input images fails to model the important local structure such as edges and lines among neighboring pixels, leading to low training sample efficiency; 2) the redundant attention backbone design of ViT leads to limited feature richness for fixed computation budgets and limited training samples. To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study. Notably, T2T-ViT reduces the parameter count and MACs of vanilla ViT by half, while achieving more than 3.0% improvement when trained from scratch on ImageNet. It also outperforms ResNets and achieves comparable performance with MobileNets by directly training on ImageNet. For example, T2T-ViT with comparable size to ResNet50 (21.5M parameters) can achieve 83.3% top1 accuracy in image resolution 384×384 on ImageNet.
<div align=center>
<img src="https://user-images.githubusercontent.com/26739999/142578381-e9040610-05d9-457c-8bf5-01c2fa94add2.png" width="60%"/>

View File

@ -16,15 +16,28 @@ While originally designed for natural language processing (NLP) tasks, the self-
### ImageNet-1k
| Model | Pretrain | resolution | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :-----: | :----------: | :--------: | :-------: | :------: | :-------: | :-------: | :-----------------------------------------------------------------: | :-------------------------------------------------------------------: |
| VAN-T\* | From scratch | 224x224 | 4.11 | 0.88 | 75.41 | 93.02 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-tiny_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-tiny_8xb128_in1k_20220501-385941af.pth) |
| VAN-S\* | From scratch | 224x224 | 13.86 | 2.52 | 81.01 | 95.63 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-small_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-small_8xb128_in1k_20220501-17bc91aa.pth) |
| VAN-B\* | From scratch | 224x224 | 26.58 | 5.03 | 82.80 | 96.21 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-base_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-base_8xb128_in1k_20220501-6a4cc31b.pth) |
| VAN-L\* | From scratch | 224x224 | 44.77 | 8.99 | 83.86 | 96.73 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-large_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-large_8xb128_in1k_20220501-f212ba21.pth) |
| Model | Pretrain | resolution | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
| :------: | :----------: | :--------: | :-------: | :------: | :-------: | :-------: | :----------------------------------------------------------------: | :-------------------------------------------------------------------: |
| VAN-B0\* | From scratch | 224x224 | 4.11 | 0.88 | 75.41 | 93.02 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-b0_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-tiny_8xb128_in1k_20220501-385941af.pth) |
| VAN-B1\* | From scratch | 224x224 | 13.86 | 2.52 | 81.01 | 95.63 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-b1_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-small_8xb128_in1k_20220501-17bc91aa.pth) |
| VAN-B2\* | From scratch | 224x224 | 26.58 | 5.03 | 82.80 | 96.21 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-b2_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-base_8xb128_in1k_20220501-6a4cc31b.pth) |
| VAN-B3\* | From scratch | 224x224 | 44.77 | 8.99 | 83.86 | 96.73 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-b3_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-large_8xb128_in1k_20220501-f212ba21.pth) |
| VAN-B4\* | From scratch | 224x224 | 60.28 | 12.22 | 84.13 | 96.86 | [config](https://github.com/open-mmlab/mmclassification/blob/master/configs/van/van-b4_8xb128_in1k.py) | [model](https://download.openmmlab.com/mmclassification/v0/van/van-b4_3rdparty_in1k_20220909-f4665b92.pth) |
\*Models with * are converted from [the official repo](https://github.com/Visual-Attention-Network/VAN-Classification). The config files of these models are only for validation. We don't ensure these config files' training accuracy and welcome you to contribute your reproduction results.
### Pre-trained Models
The pre-trained models on ImageNet-21k are used to fine-tune on the downstream tasks.
| Model | Pretrain | resolution | Params(M) | Flops(G) | Download |
| :------: | :----------: | :--------: | :-------: | :------: | :---------------------------------------------------------------------------------------------------------: |
| VAN-B4\* | ImageNet-21k | 224x224 | 60.28 | 12.22 | [model](https://download.openmmlab.com/mmclassification/v0/van/van-b4_3rdparty_in21k_20220909-db926b18.pth) |
| VAN-B5\* | ImageNet-21k | 224x224 | 89.97 | 17.21 | [model](https://download.openmmlab.com/mmclassification/v0/van/van-b5_3rdparty_in21k_20220909-18e904e3.pth) |
| VAN-B6\* | ImageNet-21k | 224x224 | 283.9 | 55.28 | [model](https://download.openmmlab.com/mmclassification/v0/van/van-b6_3rdparty_in21k_20220909-96c2cb3a.pth) |
\*Models with * are converted from [the official repo](https://github.com/Visual-Attention-Network/VAN-Classification).
## Citation
```

View File

@ -7,6 +7,7 @@ Collections:
- Weight Decay
Architecture:
- Visual Attention Network
- LKA
Paper:
URL: https://arxiv.org/pdf/2202.09741v2.pdf
Title: "Visual Attention Network"
@ -16,10 +17,10 @@ Collections:
Version: v0.23.0
Models:
- Name: van-tiny_8xb128_in1k
- Name: van-b0_3rdparty_in1k
Metadata:
FLOPs: 4110000 # 4.11M
Parameters: 880000000 # 0.88G
FLOPs: 880000000 # 0.88G
Parameters: 4110000 # 4.11M
In Collection: Visual-Attention-Network
Results:
- Dataset: ImageNet-1k
@ -28,11 +29,11 @@ Models:
Top 5 Accuracy: 93.02
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/van/van-tiny_8xb128_in1k_20220501-385941af.pth
Config: configs/van/van-tiny_8xb128_in1k.py
- Name: van-small_8xb128_in1k
Config: configs/van/van-b0_8xb128_in1k.py
- Name: van-b1_3rdparty_in1k
Metadata:
FLOPs: 13860000 # 13.86M
Parameters: 2520000000 # 2.52G
FLOPs: 2520000000 # 2.52G
Parameters: 13860000 # 13.86M
In Collection: Visual-Attention-Network
Results:
- Dataset: ImageNet-1k
@ -41,11 +42,11 @@ Models:
Top 5 Accuracy: 95.63
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/van/van-small_8xb128_in1k_20220501-17bc91aa.pth
Config: configs/van/van-small_8xb128_in1k.py
- Name: van-base_8xb128_in1k
Config: configs/van/van-b1_8xb128_in1k.py
- Name: van-b2_3rdparty_in1k
Metadata:
FLOPs: 26580000 # 26.58M
Parameters: 5030000000 # 5.03G
FLOPs: 5030000000 # 5.03G
Parameters: 26580000 # 26.58M
In Collection: Visual-Attention-Network
Results:
- Dataset: ImageNet-1k
@ -54,11 +55,11 @@ Models:
Top 5 Accuracy: 96.21
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/van/van-base_8xb128_in1k_20220501-6a4cc31b.pth
Config: configs/van/van-base_8xb128_in1k.py
- Name: van-large_8xb128_in1k
Config: configs/van/van-b2_8xb128_in1k.py
- Name: van-b3_3rdparty_in1k
Metadata:
FLOPs: 44770000 # 44.77 M
Parameters: 8990000000 # 8.99G
FLOPs: 8990000000 # 8.99G
Parameters: 44770000 # 44.77M
In Collection: Visual-Attention-Network
Results:
- Dataset: ImageNet-1k
@ -67,4 +68,17 @@ Models:
Top 5 Accuracy: 96.73
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/van/van-large_8xb128_in1k_20220501-f212ba21.pth
Config: configs/van/van-large_8xb128_in1k.py
Config: configs/van/van-b3_8xb128_in1k.py
- Name: van-b4_3rdparty_in1k
Metadata:
FLOPs: 12220000000 # 12.22G
Parameters: 60280000 # 60.28M
In Collection: Visual-Attention-Network
Results:
- Dataset: ImageNet-1k
Metrics:
Top 1 Accuracy: 84.13
Top 5 Accuracy: 96.86
Task: Image Classification
Weights: https://download.openmmlab.com/mmclassification/v0/van/van-b4_3rdparty_in1k_20220909-f4665b92.pth
Config: configs/van/van-b4_8xb128_in1k.py

View File

@ -0,0 +1,61 @@
_base_ = [
'../_base_/models/van/van_b0.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

View File

@ -0,0 +1,61 @@
_base_ = [
'../_base_/models/van/van_b1.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

View File

@ -0,0 +1,61 @@
_base_ = [
'../_base_/models/van/van_b2.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

View File

@ -0,0 +1,61 @@
_base_ = [
'../_base_/models/van/van_b3.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

View File

@ -0,0 +1,61 @@
_base_ = [
'../_base_/models/van/van_b4.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))

View File

@ -1,61 +1,6 @@
_base_ = [
'../_base_/models/van/van_base.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
_base_ = ['./van-b2_8xb128_in1k.py']
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
_deprecation_ = dict(
expected='van-b2_8xb128_in1k.p',
reference='https://github.com/open-mmlab/mmclassification/pull/1017',
)

View File

@ -1,61 +1,6 @@
_base_ = [
'../_base_/models/van/van_large.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
_base_ = ['./van-b3_8xb128_in1k.py']
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
_deprecation_ = dict(
expected='van-b3_8xb128_in1k.p',
reference='https://github.com/open-mmlab/mmclassification/pull/1017',
)

View File

@ -1,61 +1,6 @@
_base_ = [
'../_base_/models/van/van_small.py',
'../_base_/datasets/imagenet_bs64_swin_224.py',
'../_base_/schedules/imagenet_bs1024_adamw_swin.py',
'../_base_/default_runtime.py'
]
_base_ = ['./van-b1_8xb128_in1k.py']
# Note that the mean and variance used here are different from other configs
img_norm_cfg = dict(
mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='RandomResizedCrop',
size=224,
backend='pillow',
interpolation='bicubic'),
dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
dict(
type='RandAugment',
policies={{_base_.rand_increasing_policies}},
num_policies=2,
total_level=10,
magnitude_level=9,
magnitude_std=0.5,
hparams=dict(
pad_val=[round(x) for x in img_norm_cfg['mean'][::-1]],
interpolation='bicubic')),
dict(type='ColorJitter', brightness=0.4, contrast=0.4, saturation=0.4),
dict(
type='RandomErasing',
erase_prob=0.25,
mode='rand',
min_area_ratio=0.02,
max_area_ratio=1 / 3,
fill_color=img_norm_cfg['mean'][::-1],
fill_std=img_norm_cfg['std'][::-1]),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='ToTensor', keys=['gt_label']),
dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='Resize',
size=(248, -1),
backend='pillow',
interpolation='bicubic'),
dict(type='CenterCrop', crop_size=224),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
]
data = dict(
samples_per_gpu=128,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
_deprecation_ = dict(
expected='van-b1_8xb128_in1k.py',
reference='https://github.com/open-mmlab/mmclassification/pull/1017',
)

Some files were not shown because too many files have changed in this diff Show More