mmsegmentation/mmseg/models/assigners/hungarian_assigner.py
angiecao 608e319eb6
[Feature] Support Side Adapter Network (#3232)
## Motivation
Support SAN for Open-Vocabulary Semantic Segmentation
Paper: [Side Adapter Network for Open-Vocabulary Semantic
Segmentation](https://arxiv.org/abs/2302.12242)
official Code: [SAN](https://github.com/MendelXu/SAN)

## Modification
- Added the parameters of backbone vit for implementing the image
encoder of CLIP.
- Added text encoder code.
- Added segmentor multimodel encoder-decoder code for open-vocabulary
semantic segmentation.
- Added SideAdapterNetwork decode head code.
- Added config files for train and inference.
- Added tools for converting pretrained models.
- Added loss implementation for mask classification model, such as SAN,
Maskformer and remove dependency on mmdetection.
- Added test units for text encoder, multimodel encoder-decoder, san
decode head and hungarian_assigner.

## Use cases
### Convert Models
**pretrained SAN model**
The official pretrained model can be downloaded from
[san_clip_vit_b_16.pth](https://huggingface.co/Mendel192/san/blob/main/san_vit_b_16.pth)
and
[san_clip_vit_large_14.pth](https://huggingface.co/Mendel192/san/blob/main/san_vit_large_14.pth).
Use tools/model_converters/san2mmseg.py to convert offcial model into
mmseg style.
`python tools/model_converters/san2mmseg.py <MODEL_PATH> <OUTPUT_PATH>`

**pretrained CLIP model**
Use the CLIP model provided by openai to train SAN. The CLIP model can
be download from
[ViT-B-16.pt](https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt)
and
[ViT-L-14-336px.pt](https://openaipublic.azureedge.net/clip/models/3035c92b350959924f9f00213499208652fc7ea050643e8b385c2dac08641f02/ViT-L-14-336px.pt).
Use tools/model_converters/clip2mmseg.py to convert model into mmseg
style.
`python tools/model_converters/clip2mmseg.py <MODEL_PATH> <OUTPUT_PATH>`

### Inference
test san_vit-base-16 model on coco-stuff164k dataset
`python tools/test.py
./configs/san/san-vit-b16_coco-stuff164k-640x640.py
<TRAINED_MODEL_PATH>`

### Train
test san_vit-base-16 model on coco-stuff164k dataset
`python tools/train.py
./configs/san/san-vit-b16_coco-stuff164k-640x640.py --cfg-options
model.pretrained=<PRETRAINED_MODEL_PATH>`

## Comparision Results
### Train on COCO-Stuff164k
|                 |       | mIoU  | mAcc  | pAcc  |
| --------------- | ----- | ----- | ----- | ----- |
| san-vit-base16  | official  | 41.93 | 56.73 | 67.69 |
|                 | mmseg | 41.93 | 56.84 | 67.84 |
| san-vit-large14 | official  | 45.57 | 59.52 | 69.76 |
|                 | mmseg | 45.78 | 59.61 | 69.21 |

### Evaluate on Pascal Context
|                 |       | mIoU  | mAcc  | pAcc  |
| --------------- | ----- | ----- | ----- | ----- |
| san-vit-base16  | official  | 54.05 | 72.96 | 77.77 |
|                 | mmseg | 54.04 | 73.74 | 77.71 |
| san-vit-large14 | official  | 57.53 | 77.56 | 78.89 |
|                 | mmseg | 56.89 | 76.96 | 78.74 |

### Evaluate on Voc12Aug
|                 |       | mIoU  | mAcc  | pAcc  |
| --------------- | ----- | ----- | ----- | ----- |
| san-vit-base16  | official  | 93.86 | 96.61 | 97.11 |
|                 | mmseg | 94.58 | 97.01 | 97.38 |
| san-vit-large14 | official  | 95.17 | 97.61 | 97.63 |
|                 | mmseg | 95.58 | 97.75 | 97.79 |

---------

Co-authored-by: CastleDream <35064479+CastleDream@users.noreply.github.com>
Co-authored-by: yeedrag <46050186+yeedrag@users.noreply.github.com>
Co-authored-by: Yang-ChangHui <71805205+Yang-Changhui@users.noreply.github.com>
Co-authored-by: Xu CAO <49406546+SheffieldCao@users.noreply.github.com>
Co-authored-by: xiexinch <xiexinch@outlook.com>
Co-authored-by: 小飞猪 <106524776+ooooo-create@users.noreply.github.com>
2023-09-20 21:20:26 +08:00

87 lines
3.3 KiB
Python

# Copyright (c) OpenMMLab. All rights reserved.
from typing import List, Union
import torch
from mmengine import ConfigDict
from mmengine.structures import InstanceData
from scipy.optimize import linear_sum_assignment
from torch.cuda.amp import autocast
from mmseg.registry import TASK_UTILS
from .base_assigner import BaseAssigner
@TASK_UTILS.register_module()
class HungarianAssigner(BaseAssigner):
"""Computes one-to-one matching between prediction masks and ground truth.
This class uses bipartite matching-based assignment to computes an
assignment between the prediction masks and the ground truth. The
assignment result is based on the weighted sum of match costs. The
Hungarian algorithm is used to calculate the best matching with the
minimum cost. The prediction masks that are not matched are classified
as background.
Args:
match_costs (ConfigDict|List[ConfigDict]): Match cost configs.
"""
def __init__(
self, match_costs: Union[List[Union[dict, ConfigDict]], dict,
ConfigDict]
) -> None:
if isinstance(match_costs, dict):
match_costs = [match_costs]
elif isinstance(match_costs, list):
assert len(match_costs) > 0, \
'match_costs must not be a empty list.'
self.match_costs = [
TASK_UTILS.build(match_cost) for match_cost in match_costs
]
def assign(self, pred_instances: InstanceData, gt_instances: InstanceData,
**kwargs):
"""Computes one-to-one matching based on the weighted costs.
This method assign each query prediction to a ground truth or
background. The assignment first calculates the cost for each
category assigned to each query mask, and then uses the
Hungarian algorithm to calculate the minimum cost as the best
match.
Args:
pred_instances (InstanceData): Instances of model
predictions. It includes "masks", with shape
(n, h, w) or (n, l), and "cls", with shape (n, num_classes+1)
gt_instances (InstanceData): Ground truth of instance
annotations. It includes "labels", with shape (k, ),
and "masks", with shape (k, h, w) or (k, l).
Returns:
matched_quiery_inds (Tensor): The indexes of matched quieres.
matched_label_inds (Tensor): The indexes of matched labels.
"""
# compute weighted cost
cost_list = []
with autocast(enabled=False):
for match_cost in self.match_costs:
cost = match_cost(
pred_instances=pred_instances, gt_instances=gt_instances)
cost_list.append(cost)
cost = torch.stack(cost_list).sum(dim=0)
device = cost.device
# do Hungarian matching on CPU using linear_sum_assignment
cost = cost.detach().cpu()
if linear_sum_assignment is None:
raise ImportError('Please run "pip install scipy" '
'to install scipy first.')
matched_quiery_inds, matched_label_inds = linear_sum_assignment(cost)
matched_quiery_inds = torch.from_numpy(matched_quiery_inds).to(device)
matched_label_inds = torch.from_numpy(matched_label_inds).to(device)
return matched_quiery_inds, matched_label_inds