205 lines
7.0 KiB
Python
Raw Normal View History

[Project] Support CAT-Seg from CVPR2023 (#3098) Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers. ## Motivation Support CAT-Seg open-vocabulary semantic segmentation (CVPR2023). ## Modification Support CAT-Seg open-vocabulary semantic segmentation (CVPR2023). - [x] Support CAT-Seg model training. - [x] CLIP model based `backbone` (R101 & Swin-B), aggregation layers based `neck`, and `decoder` head. - [x] Provide customized coco-stuff164k_384x384 training configs. - [x] Language model supports for `open vocabulary` (OV) tasks. - [x] Support CLIP-based pretrained language model (LM) inference. - [x] Add commonly used prompts templates. - [x] Add README tutorials. - [x] Add zero-shot testing scripts. **Working on the following tasks.** - [x] Add unit test. ## BC-breaking (Optional) Does the modification introduce changes that break the backward-compatibility of the downstream repos? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR. ## Use cases (Optional) If this PR introduces a new feature, it is better to list some use cases here, and update the documentation. ## Checklist 1. Pre-commit or other linting tools are used to fix the potential lint issues. 2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness. 3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D. 4. The documentation has been modified accordingly, like docstring or example tutorials. --------- Co-authored-by: xiexinch <xiexinch@outlook.com>
2023-08-09 23:57:30 +08:00
# Copyright (c) OpenMMLab. All rights reserved.
# Source: https://github.com/openai/CLIP.
IMAGENET_TEMPLATES = [
'a bad photo of a {}.',
'a photo of many {}.',
'a sculpture of a {}.',
'a photo of the hard to see {}.',
'a low resolution photo of the {}.',
'a rendering of a {}.',
'graffiti of a {}.',
'a bad photo of the {}.',
'a cropped photo of the {}.',
'a tattoo of a {}.',
'the embroidered {}.',
'a photo of a hard to see {}.',
'a bright photo of a {}.',
'a photo of a clean {}.',
'a photo of a dirty {}.',
'a dark photo of the {}.',
'a drawing of a {}.',
'a photo of my {}.',
'the plastic {}.',
'a photo of the cool {}.',
'a close-up photo of a {}.',
'a black and white photo of the {}.',
'a painting of the {}.',
'a painting of a {}.',
'a pixelated photo of the {}.',
'a sculpture of the {}.',
'a bright photo of the {}.',
'a cropped photo of a {}.',
'a plastic {}.',
'a photo of the dirty {}.',
'a jpeg corrupted photo of a {}.',
'a blurry photo of the {}.',
'a photo of the {}.',
'a good photo of the {}.',
'a rendering of the {}.',
'a {} in a video game.',
'a photo of one {}.',
'a doodle of a {}.',
'a close-up photo of the {}.',
'a photo of a {}.',
'the origami {}.',
'the {} in a video game.',
'a sketch of a {}.',
'a doodle of the {}.',
'a origami {}.',
'a low resolution photo of a {}.',
'the toy {}.',
'a rendition of the {}.',
'a photo of the clean {}.',
'a photo of a large {}.',
'a rendition of a {}.',
'a photo of a nice {}.',
'a photo of a weird {}.',
'a blurry photo of a {}.',
'a cartoon {}.',
'art of a {}.',
'a sketch of the {}.',
'a embroidered {}.',
'a pixelated photo of a {}.',
'itap of the {}.',
'a jpeg corrupted photo of the {}.',
'a good photo of a {}.',
'a plushie {}.',
'a photo of the nice {}.',
'a photo of the small {}.',
'a photo of the weird {}.',
'the cartoon {}.',
'art of the {}.',
'a drawing of the {}.',
'a photo of the large {}.',
'a black and white photo of a {}.',
'the plushie {}.',
'a dark photo of a {}.',
'itap of a {}.',
'graffiti of the {}.',
'a toy {}.',
'itap of my {}.',
'a photo of a cool {}.',
'a photo of a small {}.',
'a tattoo of the {}.',
# 'A photo of a {} in the scene.',
]
# v1: 59.0875
IMAGENET_TEMPLATES_SELECT = [
'itap of a {}.',
'a bad photo of the {}.',
'a origami {}.',
'a photo of the large {}.',
'a {} in a video game.',
'art of the {}.',
'a photo of the small {}.',
'A photo of a {} in the scene',
]
# v9
IMAGENET_TEMPLATES_SELECT_CLIP = [
'a bad photo of the {}.',
'a photo of the large {}.',
'a photo of the small {}.',
'a cropped photo of a {}.',
'This is a photo of a {}',
'This is a photo of a small {}',
'This is a photo of a medium {}',
'This is a photo of a large {}',
'This is a masked photo of a {}',
'This is a masked photo of a small {}',
'This is a masked photo of a medium {}',
'This is a masked photo of a large {}',
'This is a cropped photo of a {}',
'This is a cropped photo of a small {}',
'This is a cropped photo of a medium {}',
'This is a cropped photo of a large {}',
'A photo of a {} in the scene',
'a bad photo of the {} in the scene',
'a photo of the large {} in the scene',
'a photo of the small {} in the scene',
'a cropped photo of a {} in the scene',
'a photo of a masked {} in the scene',
'There is a {} in the scene',
'There is the {} in the scene',
'This is a {} in the scene',
'This is the {} in the scene',
'This is one {} in the scene',
'There is a masked {} in the scene',
'There is the masked {} in the scene',
'This is a masked {} in the scene',
'This is the masked {} in the scene',
'This is one masked {} in the scene',
]
# v10, for comparison
# IMAGENET_TEMPLATES_SELECT_CLIP = [
# 'a photo of a {}.',
#
# 'This is a photo of a {}',
# 'This is a photo of a small {}',
# 'This is a photo of a medium {}',
# 'This is a photo of a large {}',
#
# 'This is a photo of a {}',
# 'This is a photo of a small {}',
# 'This is a photo of a medium {}',
# 'This is a photo of a large {}',
#
# 'a photo of a {} in the scene',
# 'a photo of a {} in the scene',
#
# 'There is a {} in the scene',
# 'There is the {} in the scene',
# 'This is a {} in the scene',
# 'This is the {} in the scene',
# 'This is one {} in the scene',
# ]
ViLD_templates = [
'There is {article} {category} in the scene.',
'There is the {category} in the scene.',
'a photo of {article} {category} in the scene.',
'a photo of the {category} in the scene.',
'a photo of one {category} in the scene.', 'itap of {article} {category}.',
'itap of my {category}.', 'itap of the {category}.',
'a photo of {article} {category}.', 'a photo of my {category}.',
'a photo of the {category}.', 'a photo of one {category}.',
'a photo of many {category}.', 'a good photo of {article} {category}.',
'a good photo of the {category}.', 'a bad photo of {article} {category}.',
'a bad photo of the {category}.', 'a photo of a nice {category}.',
'a photo of the nice {category}.', 'a photo of a cool {category}.',
'a photo of the cool {category}.', 'a photo of a weird {category}.',
'a photo of the weird {category}.', 'a photo of a small {category}.',
'a photo of the small {category}.', 'a photo of a large {category}.',
'a photo of the large {category}.', 'a photo of a clean {category}.',
'a photo of the clean {category}.', 'a photo of a dirty {category}.',
'a photo of the dirty {category}.',
'a bright photo of {article} {category}.',
'a bright photo of the {category}.',
'a dark photo of {article} {category}.', 'a dark photo of the {category}.',
'a photo of a hard to see {category}.',
'a photo of the hard to see {category}.',
'a low resolution photo of {article} {category}.',
'a low resolution photo of the {category}.',
'a cropped photo of {article} {category}.',
'a cropped photo of the {category}.',
'a close-up photo of {article} {category}.',
'a close-up photo of the {category}.',
'a jpeg corrupted photo of {article} {category}.',
'a jpeg corrupted photo of the {category}.',
'a blurry photo of {article} {category}.',
'a blurry photo of the {category}.',
'a pixelated photo of {article} {category}.',
'a pixelated photo of the {category}.',
'a black and white photo of the {category}.',
'a black and white photo of {article} {category}.',
'a plastic {category}.', 'the plastic {category}.', 'a toy {category}.',
'the toy {category}.', 'a plushie {category}.', 'the plushie {category}.',
'a cartoon {category}.', 'the cartoon {category}.',
'an embroidered {category}.', 'the embroidered {category}.',
'a painting of the {category}.', 'a painting of a {category}.'
]