mmsegmentation/configs/setr
MengzhangLI c4a52a523c [Docs] Add abstract and image for every paper (#1060)
* add abstract and main figure

* add abstract and main figure
2021-11-30 20:34:45 +08:00
..
README.md [Docs] Add abstract and image for every paper (#1060) 2021-11-30 20:34:45 +08:00
setr.yml [Enhancement] Change readme style and Update metafiles. (#895) 2021-09-28 16:25:37 +08:00
setr_mla_512x512_160k_b8_ade20k.py [Enhancement] Delete convert function and add instruction to ViT/Swin README.md (#791) 2021-08-25 15:00:41 -07:00
setr_mla_512x512_160k_b16_ade20k.py [Feature] Official implementation of SETR (#531) 2021-06-23 09:39:29 -07:00
setr_naive_512x512_160k_b16_ade20k.py [Enhancement] Delete convert function and add instruction to ViT/Swin README.md (#791) 2021-08-25 15:00:41 -07:00
setr_pup_512x512_160k_b16_ade20k.py [Enhancement] Delete convert function and add instruction to ViT/Swin README.md (#791) 2021-08-25 15:00:41 -07:00

README.md

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Introduction

Official Repo

Code Snippet

Abstract

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.

This head has two version head.
SETR (CVPR'2021)
@article{zheng2020rethinking,
  title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers},
  author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip HS and others},
  journal={arXiv preprint arXiv:2012.15840},
  year={2020}
}

Results and models

ADE20K

Method Backbone Crop Size Batch Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
SETR-Naive ViT-L 512x512 16 160000 18.40 4.72 48.28 49.56 config model | log
SETR-PUP ViT-L 512x512 16 160000 19.54 4.50 48.24 49.99 config model | log
SETR-MLA ViT-L 512x512 8 160000 10.96 - 47.34 49.05 config model | log
SETR-MLA ViT-L 512x512 16 160000 17.30 5.25 47.54 49.37 config model | log