mmpretrain/configs/swin_transformer/README.md

4.1 KiB

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Introduction

[ALGORITHM]

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Pretrain model

The pre-trained modles are converted from model zoo of Swin Transformer.

ImageNet 1k

Model Pretrain resolution Params(M) Flops(G) Top-1 (%) Top-5 (%) Download
Swin-T ImageNet-1k 224x224 28.29 4.36 81.18 95.52 model
Swin-S ImageNet-1k 224x224 49.61 8.52 83.21 96.25 model
Swin-B ImageNet-1k 224x224 87.77 15.14 83.42 96.44 model
Swin-B ImageNet-1k 384x384 87.90 44.49 84.49 96.95 model
Swin-B ImageNet-22k 224x224 87.77 15.14 85.16 97.50 model
Swin-B ImageNet-22k 384x384 87.90 44.49 86.44 98.05 model
Swin-L ImageNet-22k 224x224 196.53 34.04 86.24 97.88 model
Swin-L ImageNet-22k 384x384 196.74 100.04 87.25 98.25 model

Results and models

ImageNet

Model Pretrain resolution Params(M) Flops(G) Top-1 (%) Top-5 (%) Config Download
Swin-T ImageNet-1k 224x224 28.29 4.36 81.18 95.61 config model | log
Swin-S ImageNet-1k 224x224 49.61 8.52 83.02 96.29 config model | log
Swin-B ImageNet-1k 224x224 87.77 15.14 83.36 96.44 config model | log