Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Introduction
[ALGORITHM]
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}
Pretrain model
The pre-trained modles are converted from model zoo of Swin Transformer.
ImageNet 1k
Model |
Pretrain |
resolution |
Params(M) |
Flops(G) |
Top-1 (%) |
Top-5 (%) |
Download |
Swin-T |
ImageNet-1k |
224x224 |
28.29 |
4.36 |
81.18 |
95.52 |
model |
Swin-S |
ImageNet-1k |
224x224 |
49.61 |
8.52 |
83.21 |
96.25 |
model |
Swin-B |
ImageNet-1k |
224x224 |
87.77 |
15.14 |
83.42 |
96.44 |
model |
Swin-B |
ImageNet-1k |
384x384 |
87.90 |
44.49 |
84.49 |
96.95 |
model |
Swin-B |
ImageNet-22k |
224x224 |
87.77 |
15.14 |
85.16 |
97.50 |
model |
Swin-B |
ImageNet-22k |
384x384 |
87.90 |
44.49 |
86.44 |
98.05 |
model |
Swin-L |
ImageNet-22k |
224x224 |
196.53 |
34.04 |
86.24 |
97.88 |
model |
Swin-L |
ImageNet-22k |
384x384 |
196.74 |
100.04 |
87.25 |
98.25 |
model |
Results and models
ImageNet
Model |
Pretrain |
resolution |
Params(M) |
Flops(G) |
Top-1 (%) |
Top-5 (%) |
Config |
Download |
Swin-T |
ImageNet-1k |
224x224 |
28.29 |
4.36 |
81.18 |
95.61 |
config |
model | log |
Swin-S |
ImageNet-1k |
224x224 |
49.61 |
8.52 |
83.02 |
96.29 |
config |
model | log |
Swin-B |
ImageNet-1k |
224x224 |
87.77 |
15.14 |
83.36 |
96.44 |
config |
model | log |