* Add swin transformer archs S, B and L. * Add SwinTransformer configs * Add train config files of swin. * Align init method with original code * Use nn.Unfold to merge patch * Change all ConfigDict to dict * Add init_cfg for all subclasses of BaseModule. * Use mmcv version init function * Add Swin README * Use safer cfg copy method * Improve docstring and variable name. * Fix some difference in randaug Fix BGR bug, align scheduler config. Fix label smoothing parameter difference. * Fix missing droppath in attn * Fix bug of relative posititon table if window width is not equal to height. * Make `PatchMerging` more general, support kernel, stride, padding and dilation. * Rename `residual` to `identity` in attention and FFN. * Add `auto_pad` option to auto pad feature map * Improve docstring. * Fix bug in ShiftWMSA padding. * Remove unused `key` and `value` in ShiftWMSA * Move `PatchMerging` into utils and use common `PatchEmbed`. * Use latest `LinearClsHead`, train augments and label smooth settings. And remove original `SwinLinearClsHead`. * Mark some configs as "Evalution Only". * Remove useless comment in config * 1. Move ShiftWindowMSA and WindowMSA to `utils/attention.py` 2. Add docstrings of each module. 3. Fix some variables' names. 4. Other small improvement. * Add unit tests of swin-transformer and patchmerging. * Fix some bugs in unit tests. * Fix bug of rel_position_index if window is not square. * Make WindowMSA implicit, and add unit tests. * Add metafile.yml, update readme and model_zoo. |
||
---|---|---|
.. | ||
README.md | ||
metafile.yml | ||
swin_base_224_imagenet.py | ||
swin_base_384_imagenet.py | ||
swin_large_224_imagenet.py | ||
swin_large_384_imagenet.py | ||
swin_small_224_imagenet.py | ||
swin_tiny_224_imagenet.py |
README.md
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Introduction
[ALGORITHM]
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}
Pretrain model
The pre-trained modles are converted from model zoo of Swin Transformer.
ImageNet 1k
Model | Pretrain | resolution | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Download |
---|---|---|---|---|---|---|---|
Swin-T | ImageNet-1k | 224x224 | 28.29 | 4.36 | 81.18 | 95.52 | model |
Swin-S | ImageNet-1k | 224x224 | 49.61 | 8.52 | 83.21 | 96.25 | model |
Swin-B | ImageNet-1k | 224x224 | 87.77 | 15.14 | 83.42 | 96.44 | model |
Swin-B | ImageNet-1k | 384x384 | 87.90 | 44.49 | 84.49 | 96.95 | model |
Swin-B | ImageNet-22k | 224x224 | 87.77 | 15.14 | 85.16 | 97.50 | model |
Swin-B | ImageNet-22k | 384x384 | 87.90 | 44.49 | 86.44 | 98.05 | model |
Swin-L | ImageNet-22k | 224x224 | 196.53 | 34.04 | 86.24 | 97.88 | model |
Swin-L | ImageNet-22k | 384x384 | 196.74 | 100.04 | 87.25 | 98.25 | model |
Results and models
ImageNet
Model | Pretrain | resolution | Params(M) | Flops(G) | Top-1 (%) | Top-5 (%) | Config | Download |
---|---|---|---|---|---|---|---|---|
Swin-T | ImageNet-1k | 224x224 | 28.29 | 4.36 | 81.18 | 95.61 | config | model | log |
Swin-S | ImageNet-1k | 224x224 | 49.61 | 8.52 | 83.02 | 96.29 | config | model | log |
Swin-B | ImageNet-1k | 224x224 | 87.77 | 15.14 | 83.36 | 96.44 | config | model | log |