Qingyun a092fea8c1
[Fix] Fix MaskFormer and Mask2Former of MMSegmentation (#2532)
## Motivation

The DETR-related modules have been refactored in
open-mmlab/mmdetection#8763, which causes breakings of MaskFormer and
Mask2Former in both MMDetection (has been fixed in
open-mmlab/mmdetection#9515) and MMSegmentation. This pr fix the bugs in
MMSegmentation.

### TO-DO List

- [x] update configs
- [x] check and modify data flow
- [x] fix unit test
- [x] aligning inference
- [x] write a ckpt converter
- [x] write ckpt update script
- [x] update model zoo
- [x] update model link in readme
- [x] update
[faq.md](https://github.com/open-mmlab/mmsegmentation/blob/dev-1.x/docs/en/notes/faq.md#installation)

## Tips of Fixing other implementations based on MaskXFormer of mmseg

1. The Transformer modules should be built directly. The original
building with register manner has been refactored.
2. The config requires to be modified. Delete `type` and modify several
keys, according to the modifications in this pr.
3. The `batch_first` is set `True` uniformly in the new implementations.
Hence the data flow requires to be transposed and config of
`batch_first` needs to be modified.
4. The checkpoint trained on the old implementation should be converted
to be used in the new one.

### Convert script

```Python
import argparse
from copy import deepcopy
from collections import OrderedDict

import torch

from mmengine.config import Config
from mmseg.models import build_segmentor
from mmseg.utils import register_all_modules
register_all_modules(init_default_scope=True)


def parse_args():
    parser = argparse.ArgumentParser(
        description='MMSeg convert MaskXFormer model, by Li-Qingyun')
    parser.add_argument('Mask_what_former', type=int,
                        help='Mask what former, can be a `1` or `2`',
                        choices=[1, 2])
    parser.add_argument('CFG_FILE', help='config file path')
    parser.add_argument('OLD_CKPT_FILEPATH', help='old ckpt file path')
    parser.add_argument('NEW_CKPT_FILEPATH', help='new ckpt file path')
    args = parser.parse_args()
    return args


args = parse_args()

def get_new_name(old_name: str):
    new_name = old_name

    if 'encoder.layers' in new_name:
        new_name = new_name.replace('attentions.0', 'self_attn')

    new_name = new_name.replace('ffns.0', 'ffn')

    if 'decoder.layers' in new_name:

        if args.Mask_what_former == 2:
            # for Mask2Former
            new_name = new_name.replace('attentions.0', 'cross_attn')
            new_name = new_name.replace('attentions.1', 'self_attn')
        else:
            # for Mask2Former
            new_name = new_name.replace('attentions.0', 'self_attn')
            new_name = new_name.replace('attentions.1', 'cross_attn')

    return new_name
    
def cvt_sd(old_sd: OrderedDict):
    new_sd = OrderedDict()
    for name, param in old_sd.items():
        new_name = get_new_name(name)
        assert new_name not in new_sd
        new_sd[new_name] = param
    assert len(new_sd) == len(old_sd)
    return new_sd
    
if __name__ == '__main__':
    cfg = Config.fromfile(args.CFG_FILE)
    model_cfg = cfg.model

    segmentor = build_segmentor(model_cfg)

    refer_sd = segmentor.state_dict()
    old_ckpt = torch.load(args.OLD_CKPT_FILEPATH)
    old_sd = old_ckpt['state_dict']

    new_sd = cvt_sd(old_sd)
    print(segmentor.load_state_dict(new_sd))

    new_ckpt = deepcopy(old_ckpt)
    new_ckpt['state_dict'] = new_sd
    torch.save(new_ckpt, args.NEW_CKPT_FILEPATH)
    print(f'{args.NEW_CKPT_FILEPATH} has been saved!')
```

Usage:
```bash
# for example
python ckpt4pr2532.py 1 configs/maskformer/maskformer_r50-d32_8xb2-160k_ade20k-512x512.py original_ckpts/maskformer_r50-d32_8xb2-160k_ade20k-512x512_20221030_182724-cbd39cc1.pth cvt_outputs/maskformer_r50-d32_8xb2-160k_ade20k-512x512_20221030_182724.pth
python ckpt4pr2532.py 2 configs/mask2former/mask2former_r50_8xb2-160k_ade20k-512x512.py original_ckpts/mask2former_r50_8xb2-160k_ade20k-512x512_20221204_000055-4c62652d.pth cvt_outputs/mask2former_r50_8xb2-160k_ade20k-512x512_20221204_000055.pth
```

---------

Co-authored-by: MeowZheng <meowzheng@outlook.com>
2023-02-01 18:58:21 +08:00

102 lines
3.3 KiB
YAML

Collections:
- Name: MaskFormer
Metadata:
Training Data:
- Usage
- ADE20K
Paper:
URL: https://arxiv.org/abs/2107.06278
Title: 'MaskFormer: Per-Pixel Classification is Not All You Need for Semantic
Segmentation'
README: configs/maskformer/README.md
Code:
URL: https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/models/dense_heads/maskformer_head.py#L21
Version: dev-3.x
Converted From:
Code: https://github.com/facebookresearch/MaskFormer/
Models:
- Name: maskformer_r50-d32_8xb2-160k_ade20k-512x512
In Collection: MaskFormer
Metadata:
backbone: R-50-D32
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 23.7
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 3.29
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 44.29
Config: configs/maskformer/maskformer_r50-d32_8xb2-160k_ade20k-512x512.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/maskformer/maskformer_r50-d32_8xb2-160k_ade20k-512x512/maskformer_r50-d32_8xb2-160k_ade20k-512x512_20221030_182724-3a9cfe45.pth
- Name: maskformer_r101-d32_8xb2-160k_ade20k-512x512
In Collection: MaskFormer
Metadata:
backbone: R-101-D32
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 28.65
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 4.12
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 45.11
Config: configs/maskformer/maskformer_r101-d32_8xb2-160k_ade20k-512x512.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/maskformer/maskformer_r101-d32_8xb2-160k_ade20k-512x512/maskformer_r101-d32_8xb2-160k_ade20k-512x512_20221031_223053-84adbfcb.pth
- Name: maskformer_swin-t_upernet_8xb2-160k_ade20k-512x512
In Collection: MaskFormer
Metadata:
backbone: Swin-T
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 24.67
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 3.73
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 46.69
Config: configs/maskformer/maskformer_swin-t_upernet_8xb2-160k_ade20k-512x512.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/maskformer/maskformer_swin-t_upernet_8xb2-160k_ade20k-512x512/maskformer_swin-t_upernet_8xb2-160k_ade20k-512x512_20221114_232813-f14e7ce0.pth
- Name: maskformer_swin-s_upernet_8xb2-160k_ade20k-512x512
In Collection: MaskFormer
Metadata:
backbone: Swin-S
crop size: (512,512)
lr schd: 160000
inference time (ms/im):
- value: 37.06
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,512)
Training Memory (GB): 5.33
Results:
- Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 49.36
Config: configs/maskformer/maskformer_swin-s_upernet_8xb2-160k_ade20k-512x512.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/maskformer/maskformer_swin-s_upernet_8xb2-160k_ade20k-512x512/maskformer_swin-s_upernet_8xb2-160k_ade20k-512x512_20221115_114710-723512c7.pth