Thanks for your contribution and we appreciate it a lot. The following
instructions would make your pull request more healthy and more easily
get feedback. If you do not understand some items, don't worry, just
make the pull request and seek help from maintainers.
## Motivation
Support depth estimation algorithm [VPD](https://github.com/wl-zhao/VPD)
## Modification
1. add VPD backbone
2. add VPD decoder head for depth estimation
3. add a new segmentor `DepthEstimator` based on `EncoderDecoder` for
depth estimation
4. add an integrated metric that calculate common metrics in depth
estimation
5. add SiLog loss for depth estimation
6. add config for VPD
## BC-breaking (Optional)
Does the modification introduce changes that break the
backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the
downstream projects should modify their code to keep compatibility with
this PR.
## Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases
here, and update the documentation.
## Checklist
1. Pre-commit or other linting tools are used to fix the potential lint
issues.
7. The modification is covered by complete unit tests. If not, please
add more unit test to ensure the correctness.
8. If the modification has potential influence on downstream projects,
this PR should be tested with downstream projects, like MMDet or
MMDet3D.
9. The documentation has been modified accordingly, like docstring or
example tutorials.
* [WIP] Refactor data flow
* model return
* [WIP] Refactor data flow
* support data_samples is optional
* fix benchmark
* fix base
* minors
* rebase
* fix api
* ut
* fix api inference
* comments
* docstring
* docstring
* docstring
* fix bug of slide inference
* add assert c > 1
* assert original HardSwish when PyTorch > 1.6 in unit test
* assert original HardSwish when PyTorch > 1.6 in unit test
* assert original HardSwish when PyTorch > 1.6 in unit test
* assert original HardSwish when PyTorch > 1.6 in unit test
* assert original HardSwish when PyTorch > 1.6 in unit test
* assert original HardSwish when PyTorch > 1.6 in unit test
* [Fix] Fix the bug that vit cannot load pretrain properly when using init_cfg to specify the pretrain scheme
* [Fix] fix the coverage problem
* Update mmseg/models/backbones/vit.py
Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn>
* [Fix] make the predicate more concise and clearer
* [Fix] Modified the judgement logic
* Update tests/test_models/test_backbones/test_vit.py
Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn>
* add comments
Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn>
* add TIMMBackbone and unittests
* add timm to tests requirements
* deprecate pt1.3.1
* reduce the unittests input of timm backbone
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* fix ci
* remove unittests of large models of timm backbone
* generate coverage report for all unittests env
* reduce the unittests input of timm backbone
* reduce the unittests input of timm backbone
* [Feature]Segformer re-implementation
* Using act_cfg and norm_cfg to control activation and normalization
* Split this PR into several little PRs
* Fix lint error
* Remove SegFormerHead
* parameters init refactor
* 1. Refactor segformer backbone parameters init;
2. Remove rebundant functions and unit tests;
* Remove rebundant codes
* 1. Remove rebundant codes;
2. Modify module name;
* Refactor the backbone of segformer using mmcv.cnn.bricks.transformer.py
* Fix some code logic bugs.
* Add mit_convert.py to match pretrain keys of segformer.
* Resolve some comments.
* 1. Add some assert to ensure right params;
2. Support flexible peconv position;
* Add pe_index assert and fix unit test.
* 1. Add doc string for MixVisionTransformer;
2. Add some unit tests for MixVisionTransformer;
* Use hw_shape to pass shape of feature map.
* 1. Fix doc string of MixVisionTransformer;
2. Simplify MixFFN;
3. Modify H, W to hw_shape;
* Add more unit tests.
* Add doc string for shape convertion functions.
* Add some unit tests to improve code coverage.
* Fix Segformer backbone pretrain weights match bug.
* resolve the shape convertion functions doc string.
* Add pad_to_patch_size arg.
* Modify default value of pad_to_patch_size arg.
* [Fix] Fix vit init bug
* Add some vit unit tests
* Modify module import
* Fix pretrain weights bug
* Modify pretrained judge
* Add some unit tests to improve code cov
* Optimize code
* Fix vit unit test
* [Refactor] Using mmcv bricks to refactor vit
* Follow the vit code structure from mmclassification
* Add MMCV install into CI system.
* Add to 'Install MMCV' CI item
* Add 'Install MMCV_CPU' and 'Install MMCV_GPU CI' items
* Fix & Add
1. Fix low code coverage of vit.py;
2. Remove HybirdEmbed;
3. Fix doc string of VisionTransformer;
* Add helpers unit test.
* Add converter to convert vit pretrain weights from timm style to mmcls style.
* Clean some rebundant code and refactor init
1. Use timm style init_weights;
2. Remove to_xtuple and trunc_norm_;
* Add comments for VisionTransformer.init_weights()
* Add arg: pretrain_style to choose timm or mmcls vit pretrain weights.
* Add arg: final_reshape to control if converting output feature information from NLC to NCHW;
* Fix the default value of final_reshape;
* Modify arg: final_reshape to arg: out_shape;
* Fix some unit test bug;
* Adjust vision transformer backbone architectures;
* Add DropPath, trunc_normal_ for VisionTransformer implementation;
* Add class token buring intermediate period and remove it during final period;
* Fix some parameters loss bug;
* * Store intermediate token features and impose no processes on them;
* Remove class token and reshape entire token feature from NLC to NCHW;
* Fix some doc error
* Add a arg for VisionTransformer backbone to control if input class token into transformer;
* Add stochastic depth decay rule for DropPath;
* * Fix output bug when input_cls_token=False;
* Add related unit test;
* * Add arg: out_indices to control model output;
* Add unit test for DropPath;
* Apply suggestions from code review
Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>
* vit backbone
* fix lint
* add docstrings and fix pretrained pos_embed dim not match prob
* add unittest for vit
* fix lint
* add vit based fcn configs
* fix import error
* support multiple resolution input images
* upsample pos_embed at init_weights
* support resize pos_embed at evaluation
* fix training errors
* add more unitest code for vit backbone
* unitest for uncovered code
* add norm_eval unittest
* refactor _pos_embeding
* minor change
* change var name
* rafactor init_weight
* load weights after resize
* ignore 'module' in pretrain checkpoint
* add with_cp
* add with_cp
Co-authored-by: Jiarui XU <xvjiarui0826@gmail.com>