History

谢昕辰 119bbd838d [Enhancement] Delete convert function and add instruction to ViT/Swin README.md (#791 ) * delete convert function and add instruction to README.md * unified model convert and README * remove url * fix import error * fix unittest * rename pretrain * rename vit and deit pretrain * Update upernet_deit-b16_512x512_160k_ade20k.py * Update upernet_deit-b16_512x512_80k_ade20k.py * Update upernet_deit-b16_ln_mln_512x512_160k_ade20k.py * Update upernet_deit-b16_mln_512x512_160k_ade20k.py * Update upernet_deit-s16_512x512_160k_ade20k.py * Update upernet_deit-s16_512x512_80k_ade20k.py * Update upernet_deit-s16_ln_mln_512x512_160k_ade20k.py * Update upernet_deit-s16_mln_512x512_160k_ade20k.py Co-authored-by: Jiarui XU <xvjiarui0826@gmail.com> Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn>		2021-08-25 15:00:41 -07:00
..
README.md	[Enhancement] Delete convert function and add instruction to ViT/Swin README.md (#791 )	2021-08-25 15:00:41 -07:00
segformer.yml	[Tools] Add vit/swin/mit convert weight scripts (#783 )	2021-08-17 18:42:42 -07:00
segformer_mit-b0_512x512_160k_ade20k.py	[Feature] Add segformer decode head and related train config (#599 )	2021-08-13 13:31:19 +08:00
segformer_mit-b1_512x512_160k_ade20k.py	[Feature] Add segformer decode head and related train config (#599 )	2021-08-13 13:31:19 +08:00
segformer_mit-b2_512x512_160k_ade20k.py	[Feature] Add segformer decode head and related train config (#599 )	2021-08-13 13:31:19 +08:00
segformer_mit-b3_512x512_160k_ade20k.py	[Feature] Add segformer decode head and related train config (#599 )	2021-08-13 13:31:19 +08:00
segformer_mit-b4_512x512_160k_ade20k.py	[Feature] Add segformer decode head and related train config (#599 )	2021-08-13 13:31:19 +08:00
segformer_mit-b5_512x512_160k_ade20k.py	[Feature] Add segformer decode head and related train config (#599 )	2021-08-13 13:31:19 +08:00
segformer_mit-b5_640x640_160k_ade20k.py	[Feature] Add segformer decode head and related train config (#599 )	2021-08-13 13:31:19 +08:00

README.md

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Introduction

@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

Usage

To use other repositories' pre-trained models, it is necessary to convert keys.

We provide a script mit2mmseg.py in the tools directory to convert the key of models from the official repo to MMSegmentation style.

python tools/model_converters/swin2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH}

This script convert model from PRETRAIN_PATH and store the converted model in STORE_PATH.

Results and models

ADE20k

Method	Backbone	Crop Size	Lr schd	Mem (GB)	Inf time (fps)	mIoU	mIoU(ms+flip)	config	download
Segformer	MIT-B0	512x512	160000	2.1	51.32	37.41	38.34	config	model \| log
Segformer	MIT-B1	512x512	160000	2.6	47.66	40.97	42.54	config	model \| log
Segformer	MIT-B2	512x512	160000	3.6	30.88	45.58	47.03	config	model \| log
Segformer	MIT-B3	512x512	160000	4.8	22.11	47.82	48.81	config	model \| log
Segformer	MIT-B4	512x512	160000	6.1	15.45	48.46	49.76	config	model \| log
Segformer	MIT-B5	512x512	160000	7.2	11.89	49.13	50.22	config	model \| log
Segformer	MIT-B5	640x640	160000	11.5	11.30	49.62	50.36	config	model \| log

Evaluation with AlignedResize:

Method	Backbone	Crop Size	Lr schd	mIoU	mIoU(ms+flip)
Segformer	MIT-B0	512x512	160000	38.1	38.57
Segformer	MIT-B1	512x512	160000	41.64	42.76
Segformer	MIT-B2	512x512	160000	46.53	47.49
Segformer	MIT-B3	512x512	160000	48.46	49.14
Segformer	MIT-B4	512x512	160000	49.34	50.29
Segformer	MIT-B5	512x512	160000	50.08	50.72
Segformer	MIT-B5	640x640	160000	50.58	50.8

We replace AlignedResize in original implementatiuon to Resize + ResizeToMultiple. If you want to test by using AlignedResize, you can change the dataset pipeline like this:

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(2048, 512),
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            # resize image to multiple of 32, improve SegFormer by 0.5-1.0 mIoU.
            dict(type='ResizeToMultiple', size_divisor=32),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]