* [Fix] Fix vit init bug
* Add some vit unit tests
* Modify module import
* Fix pretrain weights bug
* Modify pretrained judge
* Add some unit tests to improve code cov
* Optimize code
* Fix vit unit test
* [Refactor] Using mmcv bricks to refactor vit
* Follow the vit code structure from mmclassification
* Add MMCV install into CI system.
* Add to 'Install MMCV' CI item
* Add 'Install MMCV_CPU' and 'Install MMCV_GPU CI' items
* Fix & Add
1. Fix low code coverage of vit.py;
2. Remove HybirdEmbed;
3. Fix doc string of VisionTransformer;
* Add helpers unit test.
* Add converter to convert vit pretrain weights from timm style to mmcls style.
* Clean some rebundant code and refactor init
1. Use timm style init_weights;
2. Remove to_xtuple and trunc_norm_;
* Add comments for VisionTransformer.init_weights()
* Add arg: pretrain_style to choose timm or mmcls vit pretrain weights.
* Add arg: final_reshape to control if converting output feature information from NLC to NCHW;
* Fix the default value of final_reshape;
* Modify arg: final_reshape to arg: out_shape;
* Fix some unit test bug;
* Adjust vision transformer backbone architectures;
* Add DropPath, trunc_normal_ for VisionTransformer implementation;
* Add class token buring intermediate period and remove it during final period;
* Fix some parameters loss bug;
* * Store intermediate token features and impose no processes on them;
* Remove class token and reshape entire token feature from NLC to NCHW;
* Fix some doc error
* Add a arg for VisionTransformer backbone to control if input class token into transformer;
* Add stochastic depth decay rule for DropPath;
* * Fix output bug when input_cls_token=False;
* Add related unit test;
* * Add arg: out_indices to control model output;
* Add unit test for DropPath;
* Apply suggestions from code review
Co-authored-by: Jerry Jiarui XU <xvjiarui0826@gmail.com>
* vit backbone
* fix lint
* add docstrings and fix pretrained pos_embed dim not match prob
* add unittest for vit
* fix lint
* add vit based fcn configs
* fix import error
* support multiple resolution input images
* upsample pos_embed at init_weights
* support resize pos_embed at evaluation
* fix training errors
* add more unitest code for vit backbone
* unitest for uncovered code
* add norm_eval unittest
* refactor _pos_embeding
* minor change
* change var name
* rafactor init_weight
* load weights after resize
* ignore 'module' in pretrain checkpoint
* add with_cp
* add with_cp
Co-authored-by: Jiarui XU <xvjiarui0826@gmail.com>