mmsegmentation/configs/mask2former
Qingyun a092fea8c1
[Fix] Fix MaskFormer and Mask2Former of MMSegmentation (#2532)
## Motivation

The DETR-related modules have been refactored in
open-mmlab/mmdetection#8763, which causes breakings of MaskFormer and
Mask2Former in both MMDetection (has been fixed in
open-mmlab/mmdetection#9515) and MMSegmentation. This pr fix the bugs in
MMSegmentation.

### TO-DO List

- [x] update configs
- [x] check and modify data flow
- [x] fix unit test
- [x] aligning inference
- [x] write a ckpt converter
- [x] write ckpt update script
- [x] update model zoo
- [x] update model link in readme
- [x] update
[faq.md](https://github.com/open-mmlab/mmsegmentation/blob/dev-1.x/docs/en/notes/faq.md#installation)

## Tips of Fixing other implementations based on MaskXFormer of mmseg

1. The Transformer modules should be built directly. The original
building with register manner has been refactored.
2. The config requires to be modified. Delete `type` and modify several
keys, according to the modifications in this pr.
3. The `batch_first` is set `True` uniformly in the new implementations.
Hence the data flow requires to be transposed and config of
`batch_first` needs to be modified.
4. The checkpoint trained on the old implementation should be converted
to be used in the new one.

### Convert script

```Python
import argparse
from copy import deepcopy
from collections import OrderedDict

import torch

from mmengine.config import Config
from mmseg.models import build_segmentor
from mmseg.utils import register_all_modules
register_all_modules(init_default_scope=True)


def parse_args():
    parser = argparse.ArgumentParser(
        description='MMSeg convert MaskXFormer model, by Li-Qingyun')
    parser.add_argument('Mask_what_former', type=int,
                        help='Mask what former, can be a `1` or `2`',
                        choices=[1, 2])
    parser.add_argument('CFG_FILE', help='config file path')
    parser.add_argument('OLD_CKPT_FILEPATH', help='old ckpt file path')
    parser.add_argument('NEW_CKPT_FILEPATH', help='new ckpt file path')
    args = parser.parse_args()
    return args


args = parse_args()

def get_new_name(old_name: str):
    new_name = old_name

    if 'encoder.layers' in new_name:
        new_name = new_name.replace('attentions.0', 'self_attn')

    new_name = new_name.replace('ffns.0', 'ffn')

    if 'decoder.layers' in new_name:

        if args.Mask_what_former == 2:
            # for Mask2Former
            new_name = new_name.replace('attentions.0', 'cross_attn')
            new_name = new_name.replace('attentions.1', 'self_attn')
        else:
            # for Mask2Former
            new_name = new_name.replace('attentions.0', 'self_attn')
            new_name = new_name.replace('attentions.1', 'cross_attn')

    return new_name
    
def cvt_sd(old_sd: OrderedDict):
    new_sd = OrderedDict()
    for name, param in old_sd.items():
        new_name = get_new_name(name)
        assert new_name not in new_sd
        new_sd[new_name] = param
    assert len(new_sd) == len(old_sd)
    return new_sd
    
if __name__ == '__main__':
    cfg = Config.fromfile(args.CFG_FILE)
    model_cfg = cfg.model

    segmentor = build_segmentor(model_cfg)

    refer_sd = segmentor.state_dict()
    old_ckpt = torch.load(args.OLD_CKPT_FILEPATH)
    old_sd = old_ckpt['state_dict']

    new_sd = cvt_sd(old_sd)
    print(segmentor.load_state_dict(new_sd))

    new_ckpt = deepcopy(old_ckpt)
    new_ckpt['state_dict'] = new_sd
    torch.save(new_ckpt, args.NEW_CKPT_FILEPATH)
    print(f'{args.NEW_CKPT_FILEPATH} has been saved!')
```

Usage:
```bash
# for example
python ckpt4pr2532.py 1 configs/maskformer/maskformer_r50-d32_8xb2-160k_ade20k-512x512.py original_ckpts/maskformer_r50-d32_8xb2-160k_ade20k-512x512_20221030_182724-cbd39cc1.pth cvt_outputs/maskformer_r50-d32_8xb2-160k_ade20k-512x512_20221030_182724.pth
python ckpt4pr2532.py 2 configs/mask2former/mask2former_r50_8xb2-160k_ade20k-512x512.py original_ckpts/mask2former_r50_8xb2-160k_ade20k-512x512_20221204_000055-4c62652d.pth cvt_outputs/mask2former_r50_8xb2-160k_ade20k-512x512_20221204_000055.pth
```

---------

Co-authored-by: MeowZheng <meowzheng@outlook.com>
2023-02-01 18:58:21 +08:00
..
README.md [Fix] Fix MaskFormer and Mask2Former of MMSegmentation (#2532) 2023-02-01 18:58:21 +08:00
mask2former.yml [Fix] Fix MaskFormer and Mask2Former of MMSegmentation (#2532) 2023-02-01 18:58:21 +08:00
mask2former_r50_8xb2-90k_cityscapes-512x1024.py [Fix] Fix MaskFormer and Mask2Former of MMSegmentation (#2532) 2023-02-01 18:58:21 +08:00
mask2former_r50_8xb2-160k_ade20k-512x512.py [Fix] Fix MaskFormer and Mask2Former of MMSegmentation (#2532) 2023-02-01 18:58:21 +08:00
mask2former_r101_8xb2-90k_cityscapes-512x1024.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_r101_8xb2-160k_ade20k-512x512.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-b-in1k-384x384-pre_8xb2-160k_ade20k-640x640.py [Fix] Fix MaskFormer and Mask2Former of MMSegmentation (#2532) 2023-02-01 18:58:21 +08:00
mask2former_swin-b-in22k-384x384-pre_8xb2-90k_cityscapes-512x1024.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-b-in22k-384x384-pre_8xb2-160k_ade20k-640x640.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-l-in22k-384x384-pre_8xb2-90k_cityscapes-512x1024.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-l-in22k-384x384-pre_8xb2-160k_ade20k-640x640.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-s_8xb2-90k_cityscapes-512x1024.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-s_8xb2-160k_ade20k-512x512.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-t_8xb2-90k_cityscapes-512x1024.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00
mask2former_swin-t_8xb2-160k_ade20k-512x512.py [Feature] Support Mask2former in MMSeg 1.x (#2255) 2022-12-05 18:34:24 +08:00

README.md

Mask2Former

Masked-attention Mask Transformer for Universal Image Segmentation

Introduction

Official Repo

Code Snippet

Abstract

Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).

@inproceedings{cheng2021mask2former,
  title={Masked-attention Mask Transformer for Universal Image Segmentation},
  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
  journal={CVPR},
  year={2022}
}
@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}

Usage

  • Mask2Former model needs to install MMDetection first.
pip install "mmdet>=3.0.0rc4"

Results and models

Cityscapes

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
Mask2Former R-50-D32 512x1024 90000 5806 9.17 80.44 - config model | log
Mask2Former R-101-D32 512x1024 90000 6971 7.11 80.80 - config model | log)
Mask2Former Swin-T 512x1024 90000 6511 7.18 81.71 - config model | log)
Mask2Former Swin-S 512x1024 90000 8282 5.57 82.57 - config model | log)
Mask2Former Swin-B (in22k) 512x1024 90000 11152 4.32 83.52 - config model | log)
Mask2Former Swin-L (in22k) 512x1024 90000 16207 2.86 83.65 - config model | log)

ADE20K

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) mIoU mIoU(ms+flip) config download
Mask2Former R-50-D32 512x512 160000 3385 26.59 47.87 - config model | log)
Mask2Former R-101-D32 512x512 160000 4190 22.97 48.60 - config model | log)
Mask2Former Swin-T 512x512 160000 3826 23.82 48.66 - config model | log)
Mask2Former Swin-S 512x512 160000 5034 19.69 51.24 - config model | log)
Mask2Former Swin-B 640x640 160000 5795 12.48 52.44 - config model | log)
Mask2Former Swin-B (in22k) 640x640 160000 5795 12.43 53.90 - config model | log)
Mask2Former Swin-L (in22k) 640x640 160000 9077 8.81 56.01 - config model | log)

Note:

  • All experiments of Mask2Former are implemented with 8 A100 GPUs with 2 samplers per GPU.
  • As mentioned at the official repo, the results of Mask2Former are relatively not stable, the result of Mask2Former(swin-s) on ADE20K dataset in the table is the medium result obtained by training 5 times following the suggestion of the author.
  • The ResNet backbones utilized in MaskFormer models are standard ResNet rather than ResNetV1c.
  • Test time augmentation is not supported in MMSegmentation 1.x version yet, we would add "ms+flip" results as soon as possible.