mmpretrain/configs/swin_transformer
Ma Zerun 076ee10cac
[Feature] Add swin-transformer model. (#271)
* Add swin transformer archs S, B and L.

* Add SwinTransformer configs

* Add train config files of swin.

* Align init method with original code

* Use nn.Unfold to merge patch

* Change all ConfigDict to dict

* Add init_cfg for all subclasses of BaseModule.

* Use mmcv version init function

* Add Swin README

* Use safer cfg copy method

* Improve docstring and variable name.

* Fix some difference in randaug

Fix BGR bug, align scheduler config.

Fix label smoothing parameter difference.

* Fix missing droppath in attn

* Fix bug of relative posititon table if window width is not equal to
height.

* Make `PatchMerging` more general, support kernel, stride, padding and
dilation.

* Rename `residual` to `identity` in attention and FFN.

* Add `auto_pad` option to auto pad feature map

* Improve docstring.

* Fix bug in ShiftWMSA padding.

* Remove unused `key` and `value` in ShiftWMSA

* Move `PatchMerging` into utils and use common `PatchEmbed`.

* Use latest `LinearClsHead`, train augments and label smooth settings.
And remove original `SwinLinearClsHead`.

* Mark some configs as "Evalution Only".

* Remove useless comment in config

* 1. Move ShiftWindowMSA and WindowMSA to `utils/attention.py`
2. Add docstrings of each module.
3. Fix some variables' names.
4. Other small improvement.

* Add unit tests of swin-transformer and patchmerging.

* Fix some bugs in unit tests.

* Fix bug of rel_position_index if window is not square.

* Make WindowMSA implicit, and add unit tests.

* Add metafile.yml, update readme and model_zoo.
2021-07-01 09:30:42 +08:00
..
README.md [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00
metafile.yml [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00
swin_base_224_imagenet.py [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00
swin_base_384_imagenet.py [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00
swin_large_224_imagenet.py [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00
swin_large_384_imagenet.py [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00
swin_small_224_imagenet.py [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00
swin_tiny_224_imagenet.py [Feature] Add swin-transformer model. (#271) 2021-07-01 09:30:42 +08:00

README.md

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Introduction

[ALGORITHM]

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Pretrain model

The pre-trained modles are converted from model zoo of Swin Transformer.

ImageNet 1k

Model Pretrain resolution Params(M) Flops(G) Top-1 (%) Top-5 (%) Download
Swin-T ImageNet-1k 224x224 28.29 4.36 81.18 95.52 model
Swin-S ImageNet-1k 224x224 49.61 8.52 83.21 96.25 model
Swin-B ImageNet-1k 224x224 87.77 15.14 83.42 96.44 model
Swin-B ImageNet-1k 384x384 87.90 44.49 84.49 96.95 model
Swin-B ImageNet-22k 224x224 87.77 15.14 85.16 97.50 model
Swin-B ImageNet-22k 384x384 87.90 44.49 86.44 98.05 model
Swin-L ImageNet-22k 224x224 196.53 34.04 86.24 97.88 model
Swin-L ImageNet-22k 384x384 196.74 100.04 87.25 98.25 model

Results and models

ImageNet

Model Pretrain resolution Params(M) Flops(G) Top-1 (%) Top-5 (%) Config Download
Swin-T ImageNet-1k 224x224 28.29 4.36 81.18 95.61 config model | log
Swin-S ImageNet-1k 224x224 49.61 8.52 83.02 96.29 config model | log
Swin-B ImageNet-1k 224x224 87.77 15.14 83.36 96.44 config model | log