* Modify default work dir when training. * Refactor gather_models.py. * Add train and test matching list. * Regression benchmark list. * lower readme name to upper readme name. * Add url check tool and model inference test tool. * Modify tool name. * Support duplicate mode of log json url check. * Add regression benchmark evaluation automatic tool. * Add train script generator. * Only Support script running. * Add evaluation results gather. * Add exec Authority. * Automatically make checkpoint root folder. * Modify gather results save path. * Coarse-grained train results gather tool. * Complete benchmark train script. * Make some little modifications. * Fix checkpoint urls. * Fix unet checkpoint urls. * Fix fast scnn & fcn checkpoint url. * Fix fast scnn checkpoint urls. * Fix fast scnn url. * Add differential results calculation. * Add differential results of regression benchmark train results. * Add an extra argument to select model. * Update nonlocal_net & hrnet checkpoint url. * Fix checkpoint url of hrnet and Fix some tta evaluation results and modify gather models tool. * Modify fast scnn checkpoint url. * Resolve new comments. * Fix url check status code bug. * Resolve some comments. * Modify train scripts generator. * Modify work_dir of regression benchmark results. * model gather tool modification. |
||
---|---|---|
.. | ||
README.md | ||
upernet_deit-b16_512x512_80k_ade20k.py | ||
upernet_deit-b16_512x512_160k_ade20k.py | ||
upernet_deit-b16_ln_mln_512x512_160k_ade20k.py | ||
upernet_deit-b16_mln_512x512_160k_ade20k.py | ||
upernet_deit-s16_512x512_80k_ade20k.py | ||
upernet_deit-s16_512x512_160k_ade20k.py | ||
upernet_deit-s16_ln_mln_512x512_160k_ade20k.py | ||
upernet_deit-s16_mln_512x512_160k_ade20k.py | ||
upernet_vit-b16_ln_mln_512x512_160k_ade20k.py | ||
upernet_vit-b16_mln_512x512_80k_ade20k.py | ||
upernet_vit-b16_mln_512x512_160k_ade20k.py | ||
vit.yml |
README.md
Vision Transformer
Introduction
@article{dosoViTskiy2020,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={DosoViTskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
journal={arXiv preprint arXiv:2010.11929},
year={2020}
}
Usage
To use other repositories' pre-trained models, it is necessary to convert keys.
We provide a script vit2mmseg.py
in the tools directory to convert the key of models from timm to MMSegmentation style.
python tools/model_converters/vit2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH}
E.g.
python tools/model_converters/vit2mmseg.py https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth pretrain/jx_vit_base_p16_224-80ecf9dd.pth
This script convert model from PRETRAIN_PATH
and store the converted model in STORE_PATH
.
Results and models
ADE20K
Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
---|---|---|---|---|---|---|---|---|---|
UPerNet | ViT-B + MLN | 512x512 | 80000 | 9.20 | 6.94 | 47.71 | 49.51 | config | model | log |
UPerNet | ViT-B + MLN | 512x512 | 160000 | 9.20 | 7.58 | 46.75 | 48.46 | config | model | log |
UPerNet | ViT-B + LN + MLN | 512x512 | 160000 | 9.21 | 6.82 | 47.73 | 49.95 | config | model | log |
UPerNet | DeiT-S | 512x512 | 80000 | 4.68 | 29.85 | 42.96 | 43.79 | config | model | log |
UPerNet | DeiT-S | 512x512 | 160000 | 4.68 | 29.19 | 42.87 | 43.79 | config | model | log |
UPerNet | DeiT-S + MLN | 512x512 | 160000 | 5.69 | 11.18 | 43.82 | 45.07 | config | model | log |
UPerNet | DeiT-S + LN + MLN | 512x512 | 160000 | 5.69 | 12.39 | 43.52 | 45.01 | config | model | log |
UPerNet | DeiT-B | 512x512 | 80000 | 7.75 | 9.69 | 45.24 | 46.73 | config | model | log |
UPerNet | DeiT-B | 512x512 | 160000 | 7.75 | 10.39 | 45.36 | 47.16 | config | model | log |
UPerNet | DeiT-B + MLN | 512x512 | 160000 | 9.21 | 7.78 | 45.46 | 47.16 | config | model | log |
UPerNet | DeiT-B + LN + MLN | 512x512 | 160000 | 9.21 | 7.75 | 45.37 | 47.23 | config | model | log |